www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - int nan

reply bearophile <bearophileHUGS lycos.com> writes:
The following comes partially from a friend of mine. If you are busy you can
skip this post of musings.

From the docs:
http://www.digitalmars.com/d/1.0/faq.html#nan
Because of the way CPUs are designed, there is no NaN value for integers, so D
uses 0 instead. It doesn't have the advantages of error detection that NaN has,
but at least errors resulting from unintended default initializations will be
consistent and therefore more debuggable.<

Seeing how abs(int.min) gives problems, and seeing how CPUs manage nans of FPs efficiently enough, it can be nice for int.min to become the nan of integers (and similar for short, long, and maybe tiny too). Such nan may also be useful for purposes similar to nullable integers of C#. Bye, bearophile
Jun 26 2009
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 The following comes partially from a friend of mine. If you are busy you can

 From the docs:
 http://www.digitalmars.com/d/1.0/faq.html#nan
Because of the way CPUs are designed, there is no NaN value for integers, so D


but at least errors resulting from unintended default initializations will be consistent and therefore more debuggable.<
 Seeing how abs(int.min) gives problems, and seeing how CPUs manage nans of FPs

similar for short, long, and maybe tiny too). Such nan may also be useful for purposes similar to nullable integers of C#.
 Bye,
 bearophile

This is IMHO (at least at first glance) a reasonable idea in the very long run. However, it isn't practical here and now for D2, because NaN behavior is implemented partly in hardware, and mathematically undefined integer operations throw hardware exceptions instead of returning int.nan on current hardware.
Jun 26 2009
prev sibling parent reply "Nick Sabalausky" <a a.a> writes:
"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:h237c9$orl$1 digitalmars.com...
 The following comes partially from a friend of mine. If you are busy you 
 can skip this post of musings.

 From the docs:
 http://www.digitalmars.com/d/1.0/faq.html#nan
Because of the way CPUs are designed, there is no NaN value for integers, 
so D uses 0 instead. It doesn't have the advantages of error detection 
that NaN has, but at least errors resulting from unintended default 
initializations will be consistent and therefore more debuggable.<

Seeing how abs(int.min) gives problems, and seeing how CPUs manage nans of FPs efficiently enough, it can be nice for int.min to become the nan of integers (and similar for short, long, and maybe tiny too). Such nan may also be useful for purposes similar to nullable integers of C#. Bye, bearophile

Interesting idea, but IMO using NaN as a default initializer is just a crutch for not having a real system of compile-time detecting/preventing of uninitialized variables from being read (C#'s system for this works very well in my experience). Ie, Default initing to NaN is certainly better than default-initing to a commonly-used value, but it still isn't the right long-term solution. Barring that "correct" solution though, I do think it would make far more sense for the default-initializer to be something that isn't so commonly used as 0. So yea, either int.min, or 0x69696969 or 0xB00BB00B, etc, ie something that will actually stand out and scream "Hey! Double-check this! It might not be right!".
Jun 26 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Nick Sabalausky:
 Ie, Default initing to NaN is certainly better than 
 default-initing to a commonly-used value, but it still isn't the right 
 long-term solution.

Having a nan has other purposes beside initialization values. You can represent missing values, like C# nullable ints (that are bigger in size, 8 bytes, I think).
 So yea, either int.min, or 0x69696969 or 0xB00BB00B, etc, ie 
 something that will actually stand out and scream "Hey! Double-check this! 
 It might not be right!".

The good thing of using int.min (and short.min, etc) is that then the numbers become symmetric, you have a positive number for each negative one, and abs() works in all cases. Bye, bearophile
Jun 27 2009
next sibling parent "Nick Sabalausky" <a a.a> writes:
"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:h250ve$1dvr$1 digitalmars.com...
 Nick Sabalausky:
 Ie, Default initing to NaN is certainly better than
 default-initing to a commonly-used value, but it still isn't the right
 long-term solution.

Having a nan has other purposes beside initialization values. You can represent missing values, like C# nullable ints (that are bigger in size, 8 bytes, I think).

Yes, I know. I only said that "default initing to nan" was a sub-optimal approach, not having nans. But I may have misunderstood you, I thought default init values was what you were talking about?
 So yea, either int.min, or 0x69696969 or 0xB00BB00B, etc, ie
 something that will actually stand out and scream "Hey! Double-check 
 this!
 It might not be right!".

The good thing of using int.min (and short.min, etc) is that then the numbers become symmetric, you have a positive number for each negative one, and abs() works in all cases.

Good point.
Jun 27 2009
prev sibling parent reply grauzone <none example.net> writes:
 Having a nan has other purposes beside initialization values. You can
represent missing values, like C# nullable ints (that are bigger in size, 8
bytes, I think).

You're saying C# nullable ints require more memory than native ints, but just how would you represent int.nan with 32 bits? The correct solution would be to add nullable value types as additional types. It'd be nice if we could have non-nullable object references at the same time. But figuring out and agreeing on a concrete design seems to be too complicated, and D will never have it. "Stop dreaming."
Jun 27 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
grauzone:
 You're saying C# nullable ints require more memory than native ints, but 
   just how would you represent int.nan with 32 bits?

Have you read my posts? I have said to use the value that currently is int.min as null, and I've explained why. I'll keep dreaming some more years, bye, bearophile
Jun 28 2009
parent reply grauzone <none example.net> writes:
 Have you read my posts? I have said to use the value that currently is int.min
as null, and I've explained why.

That wasn't very explicit. Anyway, we need int.min for, you know, doing useful stuff. We can't just define a quite random number to be a special value. Checking math operations for nullable integers would also be quite expensive (you had to check both operands before the operation). If you realize nullable ints by making them a tuple of a native int and a bool signaling nan, for most operations you only need to or the nan-bools of both operands, and store it in the result. At least I imagine that to be better, because you don't need additional jumps in the generated asm code. And this implementation choice clearly is superior, because it doesn't restrict the value range of the original type. There's no int value, that the nullable int type can't represent. Now there's the space overhead, but if you need performance, you'd restrict yourself to hardware supported operations anyway. Although it's pointless to discuss about implementation details of a feature that will never be implemented, what do you think? PS: I'd prefer "checked" math operations (as in C#, I think) over int.nan. Overflows or illegal operations would just trigger exceptions.
Jun 28 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
grauzone:
 That wasn't very explicit. Anyway, we need int.min for, you know, doing 
 useful stuff.

Like for what? Have you used a Lisp? Their tagged integers show that a smaller range is fine. And I'm just talking about 1 value in 4 billions, I don't think you will miss it much. And it's a value that has no symmetric positive.
We can't just define a quite random number to be a special value.<

It's not a random value, is a specific one, and it's an asymmetric extrema too.
 Checking math operations for nullable integers would also be 
 quite expensive (you had to check both operands before the operation).

I was talking about a hardware-managed nan of ints, shorts, longs, tinys. That's why I have defined the original posts of musings.
 Although it's pointless to discuss about implementation details of a 
 feature that will never be implemented, what do you think?

Inventions sometimes come from dreams too :-)
 PS: I'd prefer "checked" math operations (as in C#, I think) over 
 int.nan. Overflows or illegal operations would just trigger exceptions.

I'll do my best to have them in LDC (LLVM supports them already!), it's probably the only new feature I'll ask to LDC developers. If necessary I may even create a personal version of LDC that has this single extra feature. Bye, bearophile
Jun 28 2009
next sibling parent reply ponce <aliloko gmail.com> writes:
I'm sorry but I think it would be an ugly feature.

What would be the NaN of uint ?
What if you actually need 2^32 different values (such as in a linear
congruential random number generator) ?

Besides, there would be no cheap way to ensure NaN propagation (no hardware
support).
Cheers.
Jun 28 2009
parent bearophile <bearophileHUGS lycos.com> writes:
ponce:

 What would be the NaN of uint ?

Having a NaN in just signed integral values (of 1, 2, 4, 8, 16 bytes) looks enough to me, see below.
What if you actually need 2^32 different values (such as in a linear
congruential random number generator) ?<

I agree that there are many situations where you want 2^32 different values, or 2^16, etc, in such situations you can use an utiny/ushort/uint/ulong/ucent that has no nan (and once in while you may even use a nullable uint like in C#). But I think it's much less common to need 2^32 or 2^64 different signed integers.
Besides, there would be no cheap way to ensure NaN propagation (no hardware
support).<

I was talking about having hardware support, of course. Bye, bearophile
Jun 28 2009
prev sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
bearophile wrote:
 grauzone:
 That wasn't very explicit. Anyway, we need int.min for, you know, doing 
 useful stuff.

Like for what? Have you used a Lisp? Their tagged integers show that a smaller range is fine. And I'm just talking about 1 value in 4 billions, I don't think you will miss it much. And it's a value that has no symmetric positive.

It's fine for Lisp because any Lisp I've ever seen auto-upgrades out-of-range integers to (heap-allocated) bigints.
 PS: I'd prefer "checked" math operations (as in C#, I think) over 
 int.nan. Overflows or illegal operations would just trigger exceptions.

I'll do my best to have them in LDC (LLVM supports them already!), it's probably the only new feature I'll ask to LDC developers. If necessary I may even create a personal version of LDC that has this single extra feature.

I'd like to point out you don't need a new built-in type (or changes to a existing one) to use those LLVM intrinsics with LDC. Just import ldc.intrinsics, define a struct MyInt and overload operators on it using llvm_sadd_with_overflow and friends. That doesn't work for external libraries of course, but those should be free to handle overflow situations and undefined operations however they want without having to worry about int.nan...
Jun 29 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Frits van Bommel:
 It's fine for Lisp because any Lisp I've ever seen auto-upgrades out-of-range 
 integers to (heap-allocated) bigints.

I think it can be fine even if you have just fixnums with that single value missing from signed integrals.
 I'd like to point out you don't need a new built-in type (or changes to a 
 existing one) to use those LLVM intrinsics with LDC. Just import
ldc.intrinsics, 
 define a struct MyInt and overload operators on it using
llvm_sadd_with_overflow 
 and friends.
 
 That doesn't work for external libraries of course, but those should be free
to 
 handle overflow situations and undefined operations however they want without 
 having to worry about int.nan...

Probably I have not expressed myself well in this part of my post, because here I was not taking about a new int type or about int nans. I was talking about int overflows. I'll explain better in #ldc. Bye, bearophile
Jun 29 2009
prev sibling next sibling parent reply BCS <none anon.com> writes:
Hello Nick,

 Interesting idea, but IMO using NaN as a default initializer is just a
 crutch for not having a real system of compile-time
 detecting/preventing of uninitialized variables from being read (C#'s
 system for this works very well in my experience).

I think you can prove that it is impossible to do this totally correctly: int i; for(int j = foo(); j > 0; j--) i = bar(j); // what if foo() returns -5?
Jun 27 2009
next sibling parent reply Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
BCS wrote:

 Interesting idea, but IMO using NaN as a default initializer is just a
 crutch for not having a real system of compile-time
 detecting/preventing of uninitialized variables from being read (C#'s
 system for this works very well in my experience).

I think you can prove that it is impossible to do this totally correctly: int i; for(int j = foo(); j > 0; j--) i = bar(j); // what if foo() returns -5?

Complete static analysis of the flow of program control is the holy grail of compiler construction. It would allow automatic proof of many program properties (such as initialization). It may not be impossible, but it is extremely complicated. If nothing is known about the post-condition of 'foo', the sensible conclusion would be that 'i' may not be initialized after the loop. If you know that the return value of 'foo' is always positive under the given conditions, then you know otherwise. In the general case, however, you can't guarantee correct static analysis. This leaves a language/compiler with two options, I believe: * Do nothing about it. Let the programmer use int.min or set a bool to test initialization at runtime. * Add 'uninitialized' to the set of possible states of each type. Every time a variable is read, assert that it is initialized first. Use the static analysis techniques that *are* available (a set that will continue to grow) to eliminate these tests (and the extended state) where possible. The first method has the advantage of simplicity for the compiler and better runtime performance in most cases. The second method has the advantage of automatic detection of subtle bugs and more simplicity for the programmer. -- Michiel Helvensteijn
Jun 27 2009
next sibling parent reply superdan <super dan.org> writes:
Michiel Helvensteijn Wrote:

 BCS wrote:
 
 Interesting idea, but IMO using NaN as a default initializer is just a
 crutch for not having a real system of compile-time
 detecting/preventing of uninitialized variables from being read (C#'s
 system for this works very well in my experience).

I think you can prove that it is impossible to do this totally correctly: int i; for(int j = foo(); j > 0; j--) i = bar(j); // what if foo() returns -5?

Complete static analysis of the flow of program control is the holy grail of compiler construction. It would allow automatic proof of many program properties (such as initialization). It may not be impossible, but it is extremely complicated.

extremely complicated? it's machine haltin' dood.
Jun 27 2009
parent Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
superdan wrote:

 Complete static analysis of the flow of program control is the holy grail
 of compiler construction. It would allow automatic proof of many program
 properties (such as initialization). It may not be impossible, but it is
 extremely complicated.

extremely complicated? it's machine haltin' dood.

Ok, since 'complete static analysis' may include undecidable problems such as halting, I agree that in the general case, it's impossible. However, in many practical cases, it may not be. Additionally, the burden of providing loop invariants and ranking functions (to prove termination) could be given to the programmer instead of the compiler. -- Michiel Helvensteijn
Jun 27 2009
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Michiel Helvensteijn wrote:
 * Add 'uninitialized' to the set of possible states of each type. Every time
 a variable is read, assert that it is initialized first. Use the static
 analysis techniques that *are* available (a set that will continue to grow)
 to eliminate these tests (and the extended state) where possible.

I believe this is what valgrind does by instrumenting each variable at runtime.
Jun 27 2009
prev sibling parent reply Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
Denis Koroskin wrote:

 int i;

 for(int j = foo(); j > 0; j--) i = bar(j);   // what if foo() returns
 -5?

This code doesn't compile in C# and fails with the following error at first attempt to use 'i': error CS0165: Use of unassigned local variable 'i'

Ah, so C# is overly conservative. That's another option, of course. It has the advantage of always knowing at compile time that you're not reading an uninitialized value. The disadvantage is that C# will often throw out the baby with the bath water. The example program may be perfectly valid if 'foo' always returns positive. -- Michiel Helvensteijn
Jun 27 2009
parent reply "Nick Sabalausky" <a a.a> writes:
"Michiel Helvensteijn" <m.helvensteijn.remove gmail.com> wrote in message 
news:h25fbk$28mg$1 digitalmars.com...
 Denis Koroskin wrote:

 int i;

 for(int j = foo(); j > 0; j--) i = bar(j);   // what if foo() returns
 -5?

This code doesn't compile in C# and fails with the following error at first attempt to use 'i': error CS0165: Use of unassigned local variable 'i'

Ah, so C# is overly conservative. That's another option, of course. It has the advantage of always knowing at compile time that you're not reading an uninitialized value. The disadvantage is that C# will often throw out the baby with the bath water. The example program may be perfectly valid if 'foo' always returns positive.

Yes, this approach is what I was getting at. In fact, I would (and already have in the past) argue that this is *better* than the "holy grail" approach, because because it's based on very simple and easy to remember rules. Conversely, the "holy grail" approach leads to difficult-to-predict cases of small, seemingly-innocent changes in one place causing some other code to suddenly switch back and forth between "compiles" and "doesn't compile". Take this modified version of your example: ------------ // Imagine foo resides in a completely different package int foo() { return 5; } int i; for(int j = foo(); j > 3; j--) i = j; auto k = i; // Compiles at the moment... ------------ Now make a perfectly acceptable-looking change to foo: ------------ int foo() { return 2; } ------------ And all of a sudden non-local code starts flip-flopping between "compiles" and "doesn't compile". Additionally, even the "holy grail" approach still has to reduce itself to being overly conservative in certain cases anyway: ------------ int foo() { auto rnd = new RandomGenerator(); rnd.seed(systemClock); return rnd.fromRange(1,10); } ------------ So, we only have two initial choices: - Overly conservative (C#-style or "holy grail") - Overly permissive (current D approach) And if we choose "overly conservative", then our next choice is: - Overly conservative with simple, easy-to-use rules (C#-style) - Overly conservative with complex rules that have seemingly-random non-localized effects ("holy grail")
Jun 27 2009
next sibling parent reply "Nick Sabalausky" <a a.a> writes:
"Nick Sabalausky" <a a.a> wrote in message 
news:h2623m$73u$1 digitalmars.com...
 "Michiel Helvensteijn" <m.helvensteijn.remove gmail.com> wrote in message 
 news:h25fbk$28mg$1 digitalmars.com...
 Denis Koroskin wrote:

 int i;

 for(int j = foo(); j > 0; j--) i = bar(j);   // what if foo() returns
 -5?

This code doesn't compile in C# and fails with the following error at first attempt to use 'i': error CS0165: Use of unassigned local variable 'i'

Ah, so C# is overly conservative. That's another option, of course. It has the advantage of always knowing at compile time that you're not reading an uninitialized value. The disadvantage is that C# will often throw out the baby with the bath water. The example program may be perfectly valid if 'foo' always returns positive.

Yes, this approach is what I was getting at. In fact, I would (and already have in the past) argue that this is *better* than the "holy grail" approach, because because it's based on very simple and easy to remember rules. Conversely, the "holy grail" approach leads to difficult-to-predict cases of small, seemingly-innocent changes in one place causing some other code to suddenly switch back and forth between "compiles" and "doesn't compile". Take this modified version of your example: ------------ // Imagine foo resides in a completely different package int foo() { return 5; } int i; for(int j = foo(); j > 3; j--) i = j; auto k = i; // Compiles at the moment... ------------ Now make a perfectly acceptable-looking change to foo: ------------ int foo() { return 2; } ------------ And all of a sudden non-local code starts flip-flopping between "compiles" and "doesn't compile". Additionally, even the "holy grail" approach still has to reduce itself to being overly conservative in certain cases anyway: ------------ int foo() { auto rnd = new RandomGenerator(); rnd.seed(systemClock); return rnd.fromRange(1,10); } ------------ So, we only have two initial choices: - Overly conservative (C#-style or "holy grail") - Overly permissive (current D approach) And if we choose "overly conservative", then our next choice is: - Overly conservative with simple, easy-to-use rules (C#-style) - Overly conservative with complex rules that have seemingly-random non-localized effects ("holy grail")

Additionally, in the C# approach (and this is speaking from personal experience), anytime you do come across a provably-correct case that the compiler rejects, not only is it always obvious to see why the compiler rejected it, but it's also trivially easy to fix. So in practice, it's really not much of a "baby with the bathwater" situation at all.
Jun 27 2009
parent reply Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
Nick Sabalausky wrote:

 Yes, this approach is what I was getting at. In fact, I would (and
 already have in the past) argue that this is *better* than the "holy
 grail" approach, because because it's based on very simple and easy to
 remember rules. Conversely, the "holy grail" approach leads to
 difficult-to-predict cases of small, seemingly-innocent changes in one
 place causing some other code to suddenly switch back and forth between
 "compiles" and "doesn't compile". Take this modified version of your
 example:

 ------------
 // Imagine foo resides in a completely different package
 int foo() { return 5; }

 int i;
 for(int j = foo(); j > 3; j--) i = j;
 auto k = i;  // Compiles at the moment...
 ------------

 Now make a perfectly acceptable-looking change to foo:
 ------------
 int foo() { return 2; }
 ------------

 And all of a sudden non-local code starts flip-flopping between
 "compiles" and "doesn't compile".


Better than a flipflop between "runs correctly" and "runs incorrectly", wouldn't you agree? But of course, you're arguing on the other end of the spectrum. Read on.
 Additionally, even the "holy grail" approach still has to reduce itself
 to being overly conservative in certain cases anyway:
 ------------
 int foo()
 {
    auto rnd = new RandomGenerator();
    rnd.seed(systemClock);
    return rnd.fromRange(1,10);
 }
 ------------


I wouldn't call the "holy grail" overly conservative in this instance. The post-condition of 'foo' would simply be (1 <= returnValue <= 10). With no more information than that, the compiler would have to give an error, since 'foo' *may return a value* that results in an uninitialized read of 'i'. That's how it should work. No errors if and only if there is no possible execution path that results in failure, be it uninitialized-read failure, null-dereference failure or divide-by-zero failure.
 So, we only have two initial choices:
 - Overly conservative (C#-style or "holy grail")
 - Overly permissive (current D approach)


I tend to agree with BCS that the programmer should have the last say, unless the compiler can absolutely prove that (s)he is wrong. Given the choice between overly conservative and overly permissive, I would pick overly permissive. But the beauty of the holy grail is that it's neither.
 Additionally, in the C# approach (and this is speaking from personal
 experience), anytime you do come across a provably-correct case that the
 compiler rejects, not only is it always obvious to see why the compiler
 rejected it, but it's also trivially easy to fix. So in practice, it's
 really not much of a "baby with the bathwater" situation at all.

But what would the fix be in the case of our example? Surely you're not suggesting initializing 'i' to 0? Then we'd be back in the old situation where we might get unexpected runtime behavior if we were wrong about 'foo'. An acceptable solution would be: int i; assert(foo() > 3); for(int j = foo(); j > 3; j--) i = j; auto k = i; // Compiles at the moment... Would C# swallow that? -- Michiel Helvensteijn
Jun 28 2009
next sibling parent reply Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
Simen Kjaeraas wrote:

 But the beauty of the holy grail is that it's neither.

While the ugliness of it is that it's both.

Care to elaborate? -- Michiel Helvensteijn
Jun 28 2009
parent Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
Simen Kjaeraas wrote:

 But the beauty of the holy grail is that it's neither.

While the ugliness of it is that it's both.

Care to elaborate?

As has already been mentioned, one of the biggest problems with the holy grail is that it leads to capricious states of "possibly compilable". There are also bunches of examples in which it will not be able to deduce if it should compile or not, at least not without breaking modularity,

The modularity thing is a good point. I assume you're talking about encapsulation. The designer of a function should make its definition public. The stuff it requires and the stuff it guarantees. The stuff it requires can be of the form of a logical precondition. The stuff it guarantees could be, at the choice of the designer, the function body itself or a logical postcondition (with access to the initial state of the function). The postcondition is used if you want to encapsulate the function implementation. Remember that the definition should be known to the caller of the function anyway, or why would he/she call it? Often this is in the form of documentation, but ideally it would be in an assertion language the compiler can understand.
 and even then, functions called from outside sources (dlls, 
 SOs, OS functions, compiled libraries, etc) will break the system.

You're right. If nothing is known about them, they must automatically receive the weakest possible postcondition: true. Pretty much anything can happen if you call them. However, it's acceptable for either the designers of those outside functions or other programmers to supply public contracts for them. The correctness of the code on the calling side would then be contingent upon the correctness of those contracts. An acceptable compromise.
 This means the system has to be either permissive or conservative when
 encountering an problem insoluble to its logic, and this fall-back
 mechanism will then work counter-intuitively to its normal working
 order, thus giving birth to the system's dualism of both
 conservativeness and permissiveness.

Well, it would still be either one or the other. Not both. Or perhaps I still don't understand your point. I do find this topic fascinating. -- Michiel Helvensteijn
Jun 28 2009
prev sibling next sibling parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
Michiel Helvensteijn escribió:
 Nick Sabalausky wrote:
 
 Yes, this approach is what I was getting at. In fact, I would (and
 already have in the past) argue that this is *better* than the "holy
 grail" approach, because because it's based on very simple and easy to
 remember rules. Conversely, the "holy grail" approach leads to
 difficult-to-predict cases of small, seemingly-innocent changes in one
 place causing some other code to suddenly switch back and forth between
 "compiles" and "doesn't compile". Take this modified version of your
 example:

 ------------
 // Imagine foo resides in a completely different package
 int foo() { return 5; }

 int i;
 for(int j = foo(); j > 3; j--) i = j;
 auto k = i;  // Compiles at the moment...
 ------------

 Now make a perfectly acceptable-looking change to foo:
 ------------
 int foo() { return 2; }
 ------------

 And all of a sudden non-local code starts flip-flopping between
 "compiles" and "doesn't compile".


Better than a flipflop between "runs correctly" and "runs incorrectly", wouldn't you agree? But of course, you're arguing on the other end of the spectrum. Read on.
 Additionally, even the "holy grail" approach still has to reduce itself
 to being overly conservative in certain cases anyway:
 ------------
 int foo()
 {
    auto rnd = new RandomGenerator();
    rnd.seed(systemClock);
    return rnd.fromRange(1,10);
 }
 ------------


I wouldn't call the "holy grail" overly conservative in this instance. The post-condition of 'foo' would simply be (1 <= returnValue <= 10). With no more information than that, the compiler would have to give an error, since 'foo' *may return a value* that results in an uninitialized read of 'i'. That's how it should work. No errors if and only if there is no possible execution path that results in failure, be it uninitialized-read failure, null-dereference failure or divide-by-zero failure.
 So, we only have two initial choices:
 - Overly conservative (C#-style or "holy grail")
 - Overly permissive (current D approach)


I tend to agree with BCS that the programmer should have the last say, unless the compiler can absolutely prove that (s)he is wrong. Given the choice between overly conservative and overly permissive, I would pick overly permissive. But the beauty of the holy grail is that it's neither.
 Additionally, in the C# approach (and this is speaking from personal
 experience), anytime you do come across a provably-correct case that the
 compiler rejects, not only is it always obvious to see why the compiler
 rejected it, but it's also trivially easy to fix. So in practice, it's
 really not much of a "baby with the bathwater" situation at all.

But what would the fix be in the case of our example? Surely you're not suggesting initializing 'i' to 0? Then we'd be back in the old situation where we might get unexpected runtime behavior if we were wrong about 'foo'. An acceptable solution would be: int i; assert(foo() > 3); for(int j = foo(); j > 3; j--) i = j; auto k = i; // Compiles at the moment... Would C# swallow that?

Of course not: int foo() { return rand() % 10; }
Jun 28 2009
parent Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
Ary Borenszweig wrote:

 int i;
 assert(foo() > 3);
 for(int j = foo(); j > 3; j--) i = j;
 auto k = i;  // Compiles at the moment...
 
 Would C# swallow that?

Of course not: int foo() { return rand() % 10; }

My mistake. For some reason I was assuming 'foo' was pure. int i; int j = foo(); assert(j > 3); for(; j > 3; j--) i = j; auto k = i; Would C# allow this? -- Michiel Helvensteijn
Jun 28 2009
prev sibling parent reply "Nick Sabalausky" <a a.a> writes:
"Michiel Helvensteijn" <m.helvensteijn.remove gmail.com> wrote in message 
news:h2810s$hl1$1 digitalmars.com...
 Nick Sabalausky wrote:

 Yes, this approach is what I was getting at. In fact, I would (and
 already have in the past) argue that this is *better* than the "holy
 grail" approach, because because it's based on very simple and easy to
 remember rules. Conversely, the "holy grail" approach leads to
 difficult-to-predict cases of small, seemingly-innocent changes in one
 place causing some other code to suddenly switch back and forth between
 "compiles" and "doesn't compile". Take this modified version of your
 example:

 ------------
 // Imagine foo resides in a completely different package
 int foo() { return 5; }

 int i;
 for(int j = foo(); j > 3; j--) i = j;
 auto k = i;  // Compiles at the moment...
 ------------

 Now make a perfectly acceptable-looking change to foo:
 ------------
 int foo() { return 2; }
 ------------

 And all of a sudden non-local code starts flip-flopping between
 "compiles" and "doesn't compile".


Better than a flipflop between "runs correctly" and "runs incorrectly", wouldn't you agree? But of course, you're arguing on the other end of the spectrum. Read on.
 Additionally, even the "holy grail" approach still has to reduce itself
 to being overly conservative in certain cases anyway:
 ------------
 int foo()
 {
    auto rnd = new RandomGenerator();
    rnd.seed(systemClock);
    return rnd.fromRange(1,10);
 }
 ------------


I wouldn't call the "holy grail" overly conservative in this instance. The post-condition of 'foo' would simply be (1 <= returnValue <= 10). With no more information than that, the compiler would have to give an error, since 'foo' *may return a value* that results in an uninitialized read of 'i'. That's how it should work. No errors if and only if there is no possible execution path that results in failure, be it uninitialized-read failure, null-dereference failure or divide-by-zero failure.
 So, we only have two initial choices:
 - Overly conservative (C#-style or "holy grail")
 - Overly permissive (current D approach)


I tend to agree with BCS that the programmer should have the last say, unless the compiler can absolutely prove that (s)he is wrong. Given the choice between overly conservative and overly permissive, I would pick overly permissive. But the beauty of the holy grail is that it's neither.
 Additionally, in the C# approach (and this is speaking from personal
 experience), anytime you do come across a provably-correct case that the
 compiler rejects, not only is it always obvious to see why the compiler
 rejected it, but it's also trivially easy to fix. So in practice, it's
 really not much of a "baby with the bathwater" situation at all.

But what would the fix be in the case of our example? Surely you're not suggesting initializing 'i' to 0? Then we'd be back in the old situation where we might get unexpected runtime behavior if we were wrong about 'foo'.

The fix would be one of the following, depending on what the code is actually doing: --------------- // Instead of knee-jerking i to 0, we default init it to // whatever safe value we want it to be if the loop // doesn't set it. This, of course, may or may not // be zero, depending on the code, but regardless, // there are times when this IS perfectly safe. int i = contextDependentInitVal; for(int j = foo(); j > 3; j--) i = j; auto k = i; --------------- --------------- int i; bool isSet = false; // making i nullable would be better for(int j = foo(); j > 3; j--) { i = j; isSet = true; } if(isSet) { auto k = i; } else { /* handle the problem */ } --------------- Also, keep in mind that while, under this mechanism, it is certainly possible for a coder to cause bugs by always knee-jerking the value to zero whenever the compiler complains, that's also a possibility under the "holy grail" approach.
Jun 28 2009
next sibling parent reply "Nick Sabalausky" <a a.a> writes:
"Nick Sabalausky" <a a.a> wrote in message 
news:h28gqc$1duk$1 digitalmars.com...
 "Michiel Helvensteijn" <m.helvensteijn.remove gmail.com> wrote in message 
 news:h2810s$hl1$1 digitalmars.com...
 Additionally, in the C# approach (and this is speaking from personal
 experience), anytime you do come across a provably-correct case that the
 compiler rejects, not only is it always obvious to see why the compiler
 rejected it, but it's also trivially easy to fix. So in practice, it's
 really not much of a "baby with the bathwater" situation at all.

But what would the fix be in the case of our example? Surely you're not suggesting initializing 'i' to 0? Then we'd be back in the old situation where we might get unexpected runtime behavior if we were wrong about 'foo'.

The fix would be one of the following, depending on what the code is actually doing: --------------- // Instead of knee-jerking i to 0, we default init it to // whatever safe value we want it to be if the loop // doesn't set it. This, of course, may or may not // be zero, depending on the code, but regardless, // there are times when this IS perfectly safe. int i = contextDependentInitVal; for(int j = foo(); j > 3; j--) i = j; auto k = i; --------------- --------------- int i; bool isSet = false; // making i nullable would be better for(int j = foo(); j > 3; j--) { i = j; isSet = true; } if(isSet) { auto k = i; } else { /* handle the problem */ } --------------- Also, keep in mind that while, under this mechanism, it is certainly possible for a coder to cause bugs by always knee-jerking the value to zero whenever the compiler complains, that's also a possibility under the "holy grail" approach.

I would also be perfectly ok with this compiling: --------------- int foo() out { assert(ret >= 5 && ret <= 10); } body { auto rnd = new RandomGenerator(); rnd.seed(systemClock); int ret = rnd.fromRange(5,10); return ret; } int i; for(int j = foo(); j > 3; j--) i = j; auto k = i; --------------- Ie, I can agree that the compiler should be able to take advantage of a function's contract when determining whether or not to throw a "may not get inited" error, but I strongly disagree that the contract used should be implicity defined by the actual behavior of the function. IMO, In the sans-"out" versions of foo, the *only* post-condition contract is that it returns an int. If foo's creater really does intend for foo's result to always be within a certain subset of that, no matter what revisions are eventually made to foo (without actually changing the whole purpose of foo), then that should be put in a formal post-condition contract anyway, such as above.
Jun 28 2009
parent Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
Nick Sabalausky wrote:

 Ie, I can agree that the compiler should be able to take advantage of a
 function's contract when determining whether or not to throw a "may not
 get inited" error, but I strongly disagree that the contract used should
 be implicity defined by the actual behavior of the function.

Ah, we are starting to agree. :-) However, in some cases, a function is so short and/or so simple that it would be extremely redundant to provide a formal contract. Think about setters, getters and the like. Functions whose implementations are extremely unlikely to change. So while I agree in general that the definition of a function should be its contract - not its implementation - in simple cases, I would find it acceptable for the creator of a function to explicitly indicate that it is defined by its implementation. -- Michiel Helvensteijn
Jun 28 2009
prev sibling next sibling parent reply Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
Nick Sabalausky wrote:

 The fix would be one of the following, depending on what the code is
 actually doing:
 
 ---------------
 // Instead of knee-jerking i to 0, we default init it to
 // whatever safe value we want it to be if the loop
 // doesn't set it. This, of course, may or may not
 // be zero, depending on the code, but regardless,
 // there are times when this IS perfectly safe.
 
 int i = contextDependentInitVal;
 for(int j = foo(); j > 3; j--) i = j;
 auto k = i;
 ---------------
 
 ---------------
 int i;
 bool isSet = false; // making i nullable would be better
 for(int j = foo(); j > 3; j--) {
     i = j;
     isSet = true;
 }
 if(isSet) {
     auto k = i;
 } else { /* handle the problem */ }
 ---------------

Keep in mind that we're talking about a situation in which we're sure 'i' will always be set. If this is not so, the program is incorrect, and we would want to see one error or another. Your first solution would be misleading in that case. Any initial value you choose would be a hack to silence the compiler. A variation on your second solution then: int i; bool isSet = false; // making i nullable would be better for(int j = foo(); j > 3; j--) { i = j; isSet = true; } assert(isSet); auto k = i; This is the basic solution I would always choose in the absence of the grail. As you say, ideally, the 'uninitialized' state should be part of 'i', not a separate variable. Reading 'i' would then automatically assert its initialization at runtime. I guess that brings us back to one of those scenario's I mentioned in another subthread. As compilers become more sophisticated, they will be able to remove the explicit initialization, the test and the extended state in more complex situations.
 Also, keep in mind that while, under this mechanism, it is certainly
 possible for a coder to cause bugs by always knee-jerking the value to
 zero whenever the compiler complains, that's also a possibility under the
 "holy grail" approach.

That's true. But if we did have the grail, the compiler would also be able to see that knee-jerking 'i' would not satisfy the contract of the outer function. Programmers would learn to say what they mean, not what the compiler wants to hear. -- Michiel Helvensteijn
Jun 28 2009
parent reply "Nick Sabalausky" <a a.a> writes:
"Michiel Helvensteijn" <m.helvensteijn.remove gmail.com> wrote in message 
news:h28i61$1hl3$1 digitalmars.com...
 Nick Sabalausky wrote:

 The fix would be one of the following, depending on what the code is
 actually doing:

 ---------------
 // Instead of knee-jerking i to 0, we default init it to
 // whatever safe value we want it to be if the loop
 // doesn't set it. This, of course, may or may not
 // be zero, depending on the code, but regardless,
 // there are times when this IS perfectly safe.

 int i = contextDependentInitVal;
 for(int j = foo(); j > 3; j--) i = j;
 auto k = i;
 ---------------

 ---------------
 int i;
 bool isSet = false; // making i nullable would be better
 for(int j = foo(); j > 3; j--) {
     i = j;
     isSet = true;
 }
 if(isSet) {
     auto k = i;
 } else { /* handle the problem */ }
 ---------------

Keep in mind that we're talking about a situation in which we're sure 'i' will always be set. If this is not so, the program is incorrect, and we would want to see one error or another. Your first solution would be misleading in that case. Any initial value you choose would be a hack to silence the compiler.

It's a situation where we're *initially* sure 'i' will always be set. But once we see that error from the compiler, we have to reassess that belief. There are three possibilities when that happens: 1. It will always be set because of the function's contract. In this case, we do the formal contract stuff I advocated earlier. And we can certainly come up with ways to be minimally-verbose with this for trivial cases. So this case gets eliminated. 2. It will always be set, but *only* because of the function's implementation. This *should* cause a compiler error, because if it's allowed by the function's formal contract, then that very fact means that we *should* assume that this may very well flip-flop anytime that either foo or anything foo may rely upon is changed. 3. We were, in fact, *mistaken* in thinking that what we were doing would always leave 'i' inited (this does happen). This causes the programmer to reassess their approach. Depending on what they're trying to do, the solution might involve rewriting a loop that's basically fubared already, or in some cases it may very well be as simple as adding a default init value (this does happen).
 A variation on your second solution then:

 int i;
 bool isSet = false; // making i nullable would be better
 for(int j = foo(); j > 3; j--) {
    i = j;
    isSet = true;
 }
 assert(isSet);
 auto k = i;

 This is the basic solution I would always choose in the absence of the
 grail. As you say, ideally, the 'uninitialized' state should be part
 of 'i', not a separate variable. Reading 'i' would then automatically
 assert its initialization at runtime.

Yea, that works too. It's effectively a sub-case of my "if(isSet) else {/* handle this somehow*/ }", and more-or-less what I had in mind.
 I guess that brings us back to one of those scenario's I mentioned in
 another subthread. As compilers become more sophisticated, they will be
 able to remove the explicit initialization, the test and the extended 
 state
 in more complex situations.

Agreed, but with the caveat that care should be taken to ensure these new rules don't allow non-localized flip-flopping when something's[1] implementation is changed, because then the programmer has to start remembering and analyzing an increasingly complex set of rules. [1] Side-trip to spell-check land again: Apparently OpenOffice doesn't think "something" can be made possessive. (But then again, maybe it technically can't in super-anal-grammar-police land, not like I would know ;) )
 Also, keep in mind that while, under this mechanism, it is certainly
 possible for a coder to cause bugs by always knee-jerking the value to
 zero whenever the compiler complains, that's also a possibility under the
 "holy grail" approach.

That's true. But if we did have the grail, the compiler would also be able to see that knee-jerking 'i' would not satisfy the contract of the outer function.

No, it wouldn't, because it would have no way of knowing that's a knee-jerk fix for a "using uninited var" error. But maybe I misunderstand you?
Jun 28 2009
parent Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
Nick Sabalausky wrote:

 Init, assert, spell-check land, etc.

We're agreed.
 Also, keep in mind that while, under this mechanism, it is certainly
 possible for a coder to cause bugs by always knee-jerking the value to
 zero whenever the compiler complains, that's also a possibility under
 the "holy grail" approach.

That's true. But if we did have the grail, the compiler would also be able to see that knee-jerking 'i' would not satisfy the contract of the outer function.

No, it wouldn't, because it would have no way of knowing that's a knee-jerk fix for a "using uninited var" error. But maybe I misunderstand you?

I mean that if the programmer has provided a postcondition of the outer function (the function that contains the variable 'i'), a verifying compiler will be able to give an error if knee-jerking 'i' results in a subtle bug in the function; one that would invalidate the postcondition. Of course, if no postcondition is supplied, the compiler can only assume you meant for exactly that thing to happen. The bug becomes a feature. :-) Anyway, the verifying compiler is a project I'm working on. I'm designing a language based on the assumption that compilers will become more and more sophisticated in the area of static analysis. Contracts are the most important feature of this language and assertions even have their own syntax (because they'll be used so much). Where the correctness of a piece of code cannot be proved at compile-time, a managed runtime environment is used. This offers the guarantee that the current state will always satisfy the contract. Assertions cannot be 'caught' and discarded. Many optimizations may also be based on contracts. It's a really fun project. -- Michiel Helvensteijn
Jun 28 2009
prev sibling parent BCS <none anon.com> writes:
Hello Nick,

 
 Also, keep in mind that while, under this mechanism, it is certainly
 possible for a coder to cause bugs by always knee-jerking the value to
 zero whenever the compiler complains, that's also a possibility under
 the "holy grail" approach.
 

How about letting the user signal that they know what they are doing by using: int i = void;
Jun 28 2009
prev sibling parent BCS <none anon.com> writes:
Hello Nick,

 "Michiel Helvensteijn" <m.helvensteijn.remove gmail.com> wrote in
 message news:h25fbk$28mg$1 digitalmars.com...
 
 Ah, so C# is overly conservative. That's another option, of course.
 
 It has the advantage of always knowing at compile time that you're
 not reading an uninitialized value. The disadvantage is that C# will
 often throw out the baby with the bath water. The example program may
 be perfectly valid if 'foo' always returns positive.
 

already have in the past) argue that this is *better* than the "holy grail" approach, because because it's based on very simple and easy to remember rules. Conversely, the "holy grail" approach leads to difficult-to-predict cases of small, seemingly-innocent changes in one place causing some other code to suddenly switch back and forth between "compiles" and "doesn't compile".

Yes, trying to solve the problem for all cases won't work, but I think the default should be to trust the programer. If you can show for sure with a trivial set of rules that I use a variable before setting it give me an error. If not, get the heck out of my way!
Jun 27 2009
prev sibling next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Sat, 27 Jun 2009 17:50:11 +0400, BCS <none anon.com> wrote:

 Hello Nick,

 Interesting idea, but IMO using NaN as a default initializer is just a
 crutch for not having a real system of compile-time
 detecting/preventing of uninitialized variables from being read (C#'s
 system for this works very well in my experience).

I think you can prove that it is impossible to do this totally correctly: int i; for(int j = foo(); j > 0; j--) i = bar(j); // what if foo() returns -5?

This code doesn't compile in C# and fails with the following error at first attempt to use 'i': error CS0165: Use of unassigned local variable 'i'
Jun 27 2009
parent BCS <none anon.com> writes:
Hello Denis,

 On Sat, 27 Jun 2009 17:50:11 +0400, BCS <none anon.com> wrote:
 
 Hello Nick,
 
 Interesting idea, but IMO using NaN as a default initializer is just
 a crutch for not having a real system of compile-time
 detecting/preventing of uninitialized variables from being read
 (C#'s system for this works very well in my experience).
 

correctly: int i; for(int j = foo(); j > 0; j--) i = bar(j); // what if foo() returns -5?

first attempt to use 'i': error CS0165: Use of unassigned local variable 'i'

And if foo() is never <=0 then the error is valid, but incorrect. I like the int.nan idea better. Not one unassigned local variable error I have ever seen has pointed me at a bug.
Jun 27 2009
prev sibling next sibling parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
Michiel Helvensteijn wrote:

 But the beauty of the holy grail is that it's neither.

While the ugliness of it is that it's both. -- Simen
Jun 28 2009
prev sibling next sibling parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
Michiel Helvensteijn wrote:

 Simen Kjaeraas wrote:

 But the beauty of the holy grail is that it's neither.

While the ugliness of it is that it's both.

Care to elaborate?

As has already been mentioned, one of the biggest problems with the holy grail is that it leads to capricious states of "possibly compilable". There are also bunches of examples in which it will not be able to deduce if it should compile or not, at least not without breaking modularity, and even then, functions called from outside sources (dlls, SOs, OS functions, compiled libraries, etc) will break the system. This means the system has to be either permissive or conservative when encountering an problem insoluble to its logic, and this fall-back mechanism will then work counter-intuitively to its normal working order, thus giving birth to the system's dualism of both conservativeness and permissiveness. -- Simen
Jun 28 2009
prev sibling parent reply Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sun, Jun 28, 2009 at 6:02 PM, bearophile<bearophileHUGS lycos.com> wrote:

Besides, there would be no cheap way to ensure NaN propagation (no hardware
support).<

I was talking about having hardware support, of course.

Let me know when x86 gets that.
Jun 28 2009
parent "Nick Sabalausky" <a a.a> writes:
"Jarrett Billingsley" <jarrett.billingsley gmail.com> wrote in message 
news:mailman.315.1246226874.13405.digitalmars-d puremagic.com...
 On Sun, Jun 28, 2009 at 6:02 PM, bearophile<bearophileHUGS lycos.com> 
 wrote:

Besides, there would be no cheap way to ensure NaN propagation (no 
hardware support).<

I was talking about having hardware support, of course.

Let me know when x86 gets that.

Geez, it's a hypothetical discussion, for cryin out loud. Not everything has to be immediately feasable to be worthy of debate.
Jun 28 2009