www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Null references redux

reply Walter Bright <newshound1 digitalmars.com> writes:
Denis Koroskin wrote:
 On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright
 <newshound1 digitalmars.com> wrote:
 D has borrowed ideas from many different languages. The trick is to
 take the good stuff and avoid their mistakes <g>.

How about this one:

 :)

I think he's wrong. Getting rid of null references is like solving the problem of dead canaries in the coal mines by replacing them with stuffed toys. It all depends on what you prefer a program to do when it encounters a program bug: 1. Immediately stop and produce an indication that the program failed 2. Soldier on and silently produce garbage output I prefer (1). Consider the humble int. There is no invalid value such that referencing the invalid value will cause a seg fault. One case is an uninitialized int is set to garbage, and erratic results follow. Another is that (in D) ints are default initialized to 0. 0 may or may not be what the logic of the program requires, and if it isn't, again, silently bad results follow. Consider also the NaN value that floats are default initialized to. This has the nice characteristic of you know your results are bad if they are NaN. But it has the bad characteristic that you don't know where the NaN came from. Don corrected this by submitting a patch that enables the program to throw an exception upon trying to use a NaN. Then, you know exactly where your program went wrong. It is exactly analogous to a null pointer exception. And it's darned useful.
Sep 26 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Denis Koroskin wrote:
  > On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright
  > <newshound1 digitalmars.com> wrote:
  >> D has borrowed ideas from many different languages. The trick is to
  >> take the good stuff and avoid their mistakes <g>.
  >
  > How about this one:
  > 
 http://sadekdrobi.com/2008/12/22/null-references-the-billion-dollar-mistake/ 
 
  >
  >
  > :)
 
 I think he's wrong.
 
 Getting rid of null references is like solving the problem of dead 
 canaries in the coal mines by replacing them with stuffed toys.
 
 It all depends on what you prefer a program to do when it encounters a 
 program bug:
 
 1. Immediately stop and produce an indication that the program failed
 
 2. Soldier on and silently produce garbage output
 
 I prefer (1).
 
 Consider the humble int. There is no invalid value such that referencing 
 the invalid value will cause a seg fault. One case is an uninitialized 
 int is set to garbage, and erratic results follow. Another is that (in 
 D) ints are default initialized to 0. 0 may or may not be what the logic 
 of the program requires, and if it isn't, again, silently bad results 
 follow.
 
 Consider also the NaN value that floats are default initialized to. This 
 has the nice characteristic of you know your results are bad if they are 
 NaN. But it has the bad characteristic that you don't know where the NaN 
 came from. Don corrected this by submitting a patch that enables the 
 program to throw an exception upon trying to use a NaN. Then, you know 
 exactly where your program went wrong.
 
 It is exactly analogous to a null pointer exception. And it's darned 
 useful.

My assessment: the chances of convincing Walter he's wrong are quite slim... Having a rationale for being wrong is very hard to overcome. Andrei
Sep 26 2009
next sibling parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Denis Koroskin wrote:
  > On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright
  > <newshound1 digitalmars.com> wrote:
  >> D has borrowed ideas from many different languages. The trick is to
  >> take the good stuff and avoid their mistakes <g>.
  >
  > How about this one:
  > 
 http://sadekdrobi.com/2008/12/22/null-references-the-billion-dollar-mistake/ 

  >
  >
  > :)

 I think he's wrong.

 Getting rid of null references is like solving the problem of dead 
 canaries in the coal mines by replacing them with stuffed toys.

 It all depends on what you prefer a program to do when it encounters a 
 program bug:

 1. Immediately stop and produce an indication that the program failed

 2. Soldier on and silently produce garbage output

 I prefer (1).

 Consider the humble int. There is no invalid value such that 
 referencing the invalid value will cause a seg fault. One case is an 
 uninitialized int is set to garbage, and erratic results follow. 
 Another is that (in D) ints are default initialized to 0. 0 may or may 
 not be what the logic of the program requires, and if it isn't, again, 
 silently bad results follow.

 Consider also the NaN value that floats are default initialized to. 
 This has the nice characteristic of you know your results are bad if 
 they are NaN. But it has the bad characteristic that you don't know 
 where the NaN came from. Don corrected this by submitting a patch that 
 enables the program to throw an exception upon trying to use a NaN. 
 Then, you know exactly where your program went wrong.

 It is exactly analogous to a null pointer exception. And it's darned 
 useful.

My assessment: the chances of convincing Walter he's wrong are quite slim... Having a rationale for being wrong is very hard to overcome. Andrei

I actually side with Walter here. I much prefer my programs to crash on using a null reference and fix the issue than add runtime overhead that does the same thing. In most cases a simple backtrace is enough to pinpoint the location of the bug. Null references are useful to implement optional arguments without any overhead by an Optional!T wrapper. If you disallow null references what would "Object foo;" initialize to then? Jeremie
Sep 26 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jeremie Pelletier wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Denis Koroskin wrote:
  > On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright
  > <newshound1 digitalmars.com> wrote:
  >> D has borrowed ideas from many different languages. The trick is to
  >> take the good stuff and avoid their mistakes <g>.
  >
  > How about this one:
  > 
 http://sadekdrobi.com/2008/12/22/null-references-the-billion-dollar-mistake/ 

  >
  >
  > :)

 I think he's wrong.

 Getting rid of null references is like solving the problem of dead 
 canaries in the coal mines by replacing them with stuffed toys.

 It all depends on what you prefer a program to do when it encounters 
 a program bug:

 1. Immediately stop and produce an indication that the program failed

 2. Soldier on and silently produce garbage output

 I prefer (1).

 Consider the humble int. There is no invalid value such that 
 referencing the invalid value will cause a seg fault. One case is an 
 uninitialized int is set to garbage, and erratic results follow. 
 Another is that (in D) ints are default initialized to 0. 0 may or 
 may not be what the logic of the program requires, and if it isn't, 
 again, silently bad results follow.

 Consider also the NaN value that floats are default initialized to. 
 This has the nice characteristic of you know your results are bad if 
 they are NaN. But it has the bad characteristic that you don't know 
 where the NaN came from. Don corrected this by submitting a patch 
 that enables the program to throw an exception upon trying to use a 
 NaN. Then, you know exactly where your program went wrong.

 It is exactly analogous to a null pointer exception. And it's darned 
 useful.

My assessment: the chances of convincing Walter he's wrong are quite slim... Having a rationale for being wrong is very hard to overcome. Andrei

I actually side with Walter here. I much prefer my programs to crash on using a null reference and fix the issue than add runtime overhead that does the same thing. In most cases a simple backtrace is enough to pinpoint the location of the bug.

But that's a false choice. You don't choose between a crashing program and an out-of-control program. This is the fallacy. The problem is the way Walter puts it it's darn appealing. Who would want a subtly incorrect program?
 Null references are useful to implement optional arguments without any 
 overhead by an Optional!T wrapper. If you disallow null references what 
 would "Object foo;" initialize to then?

The default should be non-nullable references. You can define nullable references if you so wish. The problem is, Walter doesn't realize that the default initialization scheme and the optional lack thereof by using "= void" goes straight against his reasoning about null objects. Andrei
Sep 26 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 But that's a false choice. You don't choose between a crashing program 
 and an out-of-control program. This is the fallacy. The problem is the 
 way Walter puts it it's darn appealing. Who would want a subtly 
 incorrect program?

Oh, I've heard people argue for them.
 Null references are useful to implement optional arguments without any 
 overhead by an Optional!T wrapper. If you disallow null references 
 what would "Object foo;" initialize to then?

The default should be non-nullable references. You can define nullable references if you so wish. The problem is, Walter doesn't realize that the default initialization scheme and the optional lack thereof by using "= void" goes straight against his reasoning about null objects.

If there was a reasonable way of having NaN values for ints, D would use them. So we're stuck with a less than ideal solution, which is default initializing them to 0. At least you get repeatable results from that, rather than randomly wrong ones. "=void" is justifiable in certain optimization cases. D is, after all, a systems programming language with back doors there when you need them. The problem with non-nullable references is what do they default to? Some "nan" object? When you use a "nan" object, what should it do? Throw an exception? The problem with null references is not the canary dying, it's that there's a logic error in the user's code. Putting a gas mask on the canary keeps it from dying, but the gas is still seeping in, you just don't know it.
Sep 26 2009
next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
language_fan wrote:
 Sat, 26 Sep 2009 14:49:45 -0700, Walter Bright thusly wrote:
 
 The problem with non-nullable references is what do they default to?
 Some "nan" object? When you use a "nan" object, what should it do?
 Throw an exception?

Well typically if your type system supports algebraic types, you can define a higher order Optional type as follows: type Optional T = Some T | Nothing Now a safe nullable reference type would look like Optional (T*) The whole point is to make null pointer tests explicit.

But people are objecting to having to test for null pointers.
 You can pass 
 around the optional type freely, and only on the actual use site you need 
 to pattern match it to see if it's a null pointer:
 
   void foo(SafeRef[int] a) {
     match(a) {
       case Nothing => // handle null pointer
       case Some(b) => return b + 2;
     }
   }
 
 The default initialization of this type is Nothing.

I don't see the improvement.
 Some data structures can be initialized in a way that null pointers don't 
 exist. In these cases you can use a type that does not have the 'Nothing' 
 form. This can lead to nice optimizations. There is no default value, 
 cause default initialization can never occur.

Seems like a large restriction on data structures to build that requirement into the language. It would also make it difficult to transfer code from Java or C++ to D.
Sep 26 2009
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Denis Koroskin wrote:
 On Sun, 27 Sep 2009 01:49:45 +0400, Walter Bright 
 <newshound1 digitalmars.com> wrote:
 
 The problem with non-nullable references is what do they default to? 
 Some "nan" object? When you use a "nan" object, what should it do? 
 Throw an exception?

Oh, my! You don't even know what a non-null default is! There is a Null Object pattern (http://en.wikipedia.org/wiki/Null_Object_pattern) - I guess that's what you are talking about, when you mean "nan object" - but it has little to do with non-null references.

It's the black hole object. It prevents you from getting a seg fault, but I see no rationale for expecting that an unexpected null object always returning "I succeeded" means your program will operate correctly. The white hole object, of course, always throws an exception when it is accessed. At least you know something went wrong - but you already have that with null.
 With non-null references, you don't have "wrong values", that throw an 
 exception upon use (although it's clearly possible), you get a correct 
 value.

You're not getting a correct value, you're getting another default value. If the logic of your program is expecting a prime number > 8, and the null object returns 0, now what?
 If an object may or may not have a valid value, you mark it as nullable. 
 All the difference is that it's a non-default behavior, that's it. And a 
 user is now warned, that an object may be not initialized.

He isn't warned, that's just the problem. The null object happily says "I succeeded" for all input and returns more default values and null objects. What happens if the output of your program then becomes a null object? How are you going to go about tracing that back to its source? That's a lot harder than working backwards from where a null exception originated. I used to work at Boeing designing critical flight systems. Absolutely the WRONG failure mode is to pretend nothing went wrong and happily return default values and show lovely green lights on the instrument panel. The right thing is to immediately inform the pilot that something went wrong and INSTANTLY SHUT THE BAD SYSTEM DOWN before it does something really, really bad, because now it is in an unknown state. The pilot then follows the procedure he's trained to, such as engage the backup. A null pointer exception fits right in with that philosophy. You could think of null exceptions like pain - sure it's unpleasant, but people who feel no pain constantly injure themselves and don't live very long. When I went to the dentist as a kid for the first time, he shot my cheek full of novacaine. After the dental work, I went back to school. I found to my amusement that if I chewed on my cheek, it didn't hurt. Boy was I sorry about that later <g>.
Sep 26 2009
next sibling parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Walter Bright wrote:
 Denis Koroskin wrote:
 On Sun, 27 Sep 2009 01:49:45 +0400, Walter Bright 
 <newshound1 digitalmars.com> wrote:

 The problem with non-nullable references is what do they default to? 
 Some "nan" object? When you use a "nan" object, what should it do? 
 Throw an exception?

Oh, my! You don't even know what a non-null default is! There is a Null Object pattern (http://en.wikipedia.org/wiki/Null_Object_pattern) - I guess that's what you are talking about, when you mean "nan object" - but it has little to do with non-null references.

It's the black hole object. It prevents you from getting a seg fault, but I see no rationale for expecting that an unexpected null object always returning "I succeeded" means your program will operate correctly. The white hole object, of course, always throws an exception when it is accessed. At least you know something went wrong - but you already have that with null.
 With non-null references, you don't have "wrong values", that throw an 
 exception upon use (although it's clearly possible), you get a correct 
 value.

You're not getting a correct value, you're getting another default value. If the logic of your program is expecting a prime number > 8, and the null object returns 0, now what?
 If an object may or may not have a valid value, you mark it as 
 nullable. All the difference is that it's a non-default behavior, 
 that's it. And a user is now warned, that an object may be not 
 initialized.

He isn't warned, that's just the problem. The null object happily says "I succeeded" for all input and returns more default values and null objects. What happens if the output of your program then becomes a null object? How are you going to go about tracing that back to its source? That's a lot harder than working backwards from where a null exception originated. I used to work at Boeing designing critical flight systems. Absolutely the WRONG failure mode is to pretend nothing went wrong and happily return default values and show lovely green lights on the instrument panel. The right thing is to immediately inform the pilot that something went wrong and INSTANTLY SHUT THE BAD SYSTEM DOWN before it does something really, really bad, because now it is in an unknown state. The pilot then follows the procedure he's trained to, such as engage the backup. A null pointer exception fits right in with that philosophy. You could think of null exceptions like pain - sure it's unpleasant, but people who feel no pain constantly injure themselves and don't live very long. When I went to the dentist as a kid for the first time, he shot my cheek full of novacaine. After the dental work, I went back to school. I found to my amusement that if I chewed on my cheek, it didn't hurt. Boy was I sorry about that later <g>.

Haha that's a nice analogy, I myself was just unable to speak. I guess that's what you call undefined behavior! That's exactly the point with nonnull references, they turn access violations or segfaults into undefined behavior, or worse into generic behavior that's much harder to track to its source. I think nonnull references are a nice concept for languages that have a higher level than D. If I expect references to never be null I just don't check for null before using them, and let the code crash which gives me a nice crash window with a backtrace in my runtime. Jeremie
Sep 26 2009
parent Jeremie Pelletier <jeremiep gmail.com> writes:
Jarrett Billingsley wrote:
 On Sat, Sep 26, 2009 at 7:21 PM, Jeremie Pelletier <jeremiep gmail.com> wrote:
 That's exactly the point with nonnull references, they turn access
 violations or segfaults into undefined behavior, or worse into generic
 behavior that's much harder to track to its source.

 I think nonnull references are a nice concept for languages that have a
 higher level than D. If I expect references to never be null I just don't
 check for null before using them, and let the code crash which gives me a
 nice crash window with a backtrace in my runtime.

You're missing the point. You wouldn't have "undefined behavior at runtime" with nonnull references because there would be NO POINT in having nonnull references without ALSO having nullable references. Could your reference be null? Use a nullable reference. Is your reference never supposed to be null? Use a nonnull reference. End of problem. You do not create "null objects" and store them in a nonnull reference which you then check at runtime. You use a nullable reference which the language *forces* you to check before use.

I don't want the language to force me to check nullable references before using them, that just takes away a lot of optimization cases. You could just use the casting system to sneak null into a nonnull reference and bam, undefined behavior. And you could have nullables which are always nonnull at some point in time within a scope but your only way out of the compiler errors about using a nullable without first testing it for nullity is to use excessive casting.
Sep 26 2009
prev sibling next sibling parent Christopher Wright <dhasenan gmail.com> writes:
Walter Bright wrote:
 Denis Koroskin wrote:
 If an object may or may not have a valid value, you mark it as 
 nullable. All the difference is that it's a non-default behavior, 
 that's it. And a user is now warned, that an object may be not 
 initialized.

He isn't warned, that's just the problem. The null object happily says "I succeeded" for all input and returns more default values and null objects.

This is not the proposal. The proposal was to codify in the type system whether a particular object has "null" as a valid value. If a variable that cannot be null is not initialized to a non-null value before use, that is an error. It's entirely equivalent to using the current type system with a ton of manual contracts requiring that variables not be null. Except the contracts are enforced at compile time, not runtime. A similar concept would be range-bounded integer types, or floating point types that cannot be NaN or infinity.
Sep 26 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
language_fan wrote:
 The idea behind non-nullable types and other contracts is to catch these 
 errors on compile time. Sure, the code is a bit harder to write, but it 
 is safe and never segfaults. The idea is to minimize the amount of 
 runtime errors of all sorts. That's also how other features of statically 
 typed languages work.

I certainly agree that catching errors at compile time is preferable by far. Where I disagree is the notion that non-nullable types achieve this. I've argued extensively here that they hide errors, not fix them. Also, by "safe" I presume you mean "memory safe" which means free of memory corruption. Null pointer exceptions are memory safe. A null pointer could be caused by memory corruption, but it cannot *cause* memory corruption.
Sep 26 2009
next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Walter Bright Wrote:

 language_fan wrote:
 The idea behind non-nullable types and other contracts is to catch these 
 errors on compile time. Sure, the code is a bit harder to write, but it 
 is safe and never segfaults. The idea is to minimize the amount of 
 runtime errors of all sorts. That's also how other features of statically 
 typed languages work.

I certainly agree that catching errors at compile time is preferable by far. Where I disagree is the notion that non-nullable types achieve this. I've argued extensively here that they hide errors, not fix them.

If you argued any cases other than there's no good default initialization, I missed it. I reject the default initialization argument. I find code that relies on default initialization hard to read. I also find C#'s warning of uninitialized variables highly useful. I've also never had a bug that Don's signalling nans would help with. I've had nan bugs that cropped up later though... On top of that, use of a null variable because it was never set are typically easy to find.
 Also, by "safe" I presume you mean "memory safe" which means free of 
 memory corruption. Null pointer exceptions are memory safe. A null 
 pointer could be caused by memory corruption, but it cannot *cause* 
 memory corruption.

I reject this argument too :( To me, code isn't safe if it crashes. Did Boeing avoid checking for fault modes that were easily and reliably detectable? It seems stupid to argue that it's ok for an altimeter can send bogus data as long as it's easy to detect. All you have to do is turn off autopilot. Who cares, right? Why should I use D for production code if it's designed to segfault? Software isn't used for important things like autopilot, controlling the brakes in my car, or dispensing medicine in hospitals. There's no problem allowing that stuff to crash. You can always recover the core file, and it's always trivial to reproduce the scenario... Mix in other things like malfunctioning debug data, and I wonder why I even use D.
Sep 26 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Jason House wrote:
 Also, by "safe" I presume you mean "memory safe" which means free
 of memory corruption. Null pointer exceptions are memory safe. A
 null pointer could be caused by memory corruption, but it cannot
 *cause* memory corruption.

I reject this argument too :( To me, code isn't safe if it crashes.

Well, we can't discuss this if we cannot agree on terms. The conventional definition of memory safe means no memory corruption. A null pointer dereference is not memory corruption. You can call it something else, but if you call it "unsafe" then people will misunderstand you.
 Did Boeing avoid checking for fault modes that were easily and
 reliably detectable? It seems stupid to argue that it's ok for an
 altimeter can send bogus data as long as it's easy to detect. All you
 have to do is turn off autopilot. Who cares, right?

Errors in incorrectly initialized data are not easily and reliably detectable. A null pointer, on the other hand, *is* reliably detectable by the hardware. Boeing's philosophy is that if the airplane cannot tolerate a particular system failing abruptly and completely, then the design is faulty. That's also the FAA regulations. Safety is achieved NOT by designing systems that cannot fail, but by designing systems that can survive failure. In particular, if the airplane cannot handle turning off the autopilot, it will be rejected by both Boeing and the FAA. Name any single part or system on a Boeing airliner, and if it vanishes abruptly in a puff of smoke, the airliner will survive it. There is no "the autopilot is receiving corrupted data, but what the hell, we'll keep it turned on anyway". It's inconceivable. The only reasonable thing a program can do if it discovers it is in an unknown state is to stop immediately. The only reasonable way to use a program is to be able to tolerate its complete failure.
 Why should I use D for production code if it's designed to segfault?
 Software isn't used for important things like autopilot, controlling
 the brakes in my car, or dispensing medicine in hospitals. There's no
 problem allowing that stuff to crash. You can always recover the core
 file, and it's always trivial to reproduce the scenario...

It's not designed to segfault. It's designed to expose errors, not hide them. The system that uses the autopilot is designed to survive total failure of the autopilot. The same for your brakes in your car (ever wonder why there are dual brake systems, and if your power assist fails you can still use the brakes?). I don't know how the ABS works, but I would bet you plenty that if the computer controlling it fails, the brakes will still function. And you bet your life (literally) that if a computer dispensing radiation or medicine into your body better stop immediately if it detects it is in an unknown state. Do you *really* want the radiation machine to continue operating if it has self-detected a program bug? Do you really want to BET YOUR LIFE that the software in it is perfect? Do you think that requiring the software be literally perfect is a reasonable, achievable, and safe requirement? I don't. Not for a minute. And NOTHING Boeing designs relies on perfection for safety, either. In fact, the opposite is true, the designs are all based on "what if this fails?" If the answer is "people die" then the engineers are sent back to the trenches. Hospitals are way, way behind on this approach. Even adding simple checklists (pilots starting using them 70 years ago) have reduced accidental deaths in hospitals by 30%, a staggering improvement.
 Mix in other things like malfunctioning debug data, and I wonder why
 I even use D.

The debug data is a serious problem, and I think I've got it corrected now.
Sep 27 2009
next sibling parent "Manfred_Nowak" <svv1999 hotmail.com> writes:
Walter Bright wrote:

 Name any single part or system on a Boeing airliner, and if it
 vanishes abruptly in a puff of smoke, the airliner will survive it. 

Except this sentence I applaud every thought. If "single part" includes the passenger area, the meaning of this sentence is upright ridiculous. -manfred
Sep 27 2009
prev sibling next sibling parent reply "Nick Sabalausky" <a a.a> writes:
"Walter Bright" <newshound1 digitalmars.com> wrote in message 
news:h9n3k5$2eu9$1 digitalmars.com...
 Jason House wrote:
 Also, by "safe" I presume you mean "memory safe" which means free
 of memory corruption. Null pointer exceptions are memory safe. A
 null pointer could be caused by memory corruption, but it cannot
 *cause* memory corruption.

I reject this argument too :( To me, code isn't safe if it crashes.

Well, we can't discuss this if we cannot agree on terms. The conventional definition of memory safe means no memory corruption.

He keeps saying "safe", and every time he does you turn it into "memory safe". If he meant "memory safe" he probably would have said something like "memory safe". He already made it perfectly clear he's talking about crashes, so continuing to put the words "memory safe" into his mouth doesn't help the discussion.
 Boeing, Boeing, Boeing, Boeing, Boeing...

Straw man. No one's arguing against designing systems to survive failure, and no one's arguing against forcing errors to be exposed. Your point seems to be: A good system is designed to handle a crash/failure without corruption, so let's allow things to crash/fail all they want. Our point is: A good system is designed to handle a crash/failure without corruption, but let's also do what we can to minimize the amount of crashes/failures in the first place. You're acting as if handling failures safely and minimizing failures were mutually exclusive.
 It's not designed to segfault. It's designed to expose errors, not hide 
 them.

Right. And some of these errors can be exposed at compile time...and you want to just leave them as runtime segfaults instead? And you want this because exposing an error at compile time somehow causes it to become a hidden error?
Sep 27 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Nick Sabalausky:

 He keeps saying "safe", and every time he does you turn it into "memory 
 safe". If he meant "memory safe" he probably would have said something like 
 "memory safe". He already made it perfectly clear he's talking about 
 crashes, so continuing to put the words "memory safe" into his mouth doesn't 
 help the discussion.

Likewise, I think that the name of SafeD modules is misleading, they are MemorySafeD :-) Bye, bearophile
Sep 27 2009
prev sibling next sibling parent reply Lutger <lutger.blijdestijn gmail.com> writes:
Nick Sabalausky wrote:

 "Walter Bright" <newshound1 digitalmars.com> wrote in message

 You're acting as if handling failures safely and minimizing failures were
 mutually exclusive.

Not that I have an opinion on this either way, but if I understand Walter right that is exactly his point (although you exaggerate it a bit), see below.
 It's not designed to segfault. It's designed to expose errors, not hide
 them.

Right. And some of these errors can be exposed at compile time...and you want to just leave them as runtime segfaults instead? And you want this because exposing an error at compile time somehow causes it to become a hidden error?

somehow -> encourages a practice where programmers get annoyed by the 'exposing of errors' to the point that they hide them This is what it's about, I think: are non-nullable references *by default* so annoying as to cause programmers to initialize them with wrong values (or circumventing them in other ways)? The answer may depend on the details of the feature, quality of implementation and on the habits of the 'programmers' in question, I don't know.
Sep 27 2009
next sibling parent Jason House <jason.james.house gmail.com> writes:
Lutger Wrote:

 This is what it's about, I think: are non-nullable references *by default* 
 so annoying as to cause programmers to initialize them with wrong values (or 
 circumventing them in other ways)? 
 The answer may depend on the details of the feature, quality of 
 implementation and on the habits of the 'programmers' in question, I don't 
 know. 

In reality, the issue becomes what will programmers do to bypass compiler errors. This is one area where syntactic sugar is worth its weight in gold. I'm envisioning the syntax of Fan, or a very C#-like syntax: SomeType x; // Not nullable SomeType? y; // Nullable If the developer is too lazy to add the question mark and prefers to do SomeType x = cast(SomeType) null; Then it's their own fault when they get a runtime segfault to replace a compile-time error.
Sep 27 2009
prev sibling parent reply BCS <none anon.com> writes:
Hello Lutger,

 The answer may
 depend on [...]
 the habits of the 'programmers' in question, I don't know.
 

If you can't trust the programmer to write good code, replace them with someone you can trust. There will never be a usable language that can take in garbage and spit out correct programs.
Sep 27 2009
parent reply Lutger <lutger.blijdestijn gmail.com> writes:
BCS wrote:

 Hello Lutger,
 
 The answer may
 depend on [...]
 the habits of the 'programmers' in question, I don't know.
 

If you can't trust the programmer to write good code, replace them with someone you can trust. There will never be a usable language that can take in garbage and spit out correct programs.

Hi. I don't think this argument will work, for several reasons: First, there is a huge demand for programmers, so much that even I got hired in this time of crisis ;) Good programmers don't suddenly fall from the skies apparently. Second, there are lot's of tasks doable by programmers with less skill than others using tools that trade safety for performance / expressiveness / whatever. Finally, programmers are humans, humans make mistakes, have quirks and bad days. All of them. What it comes down to is that languages are made in order to service and adapt to the programmers, not the other way around. Do you maintain that a programmer who can't deal with non-nullable references without hacking them away is unusually incompetent? I don't know about this. Actually I suspect non-nullable references by default are in the end safer (whatever that means), but only if they don't complicate the use of nullable references.
Sep 27 2009
next sibling parent reply BCS <none anon.com> writes:
Hello Lutger,

 BCS wrote:
 
 Hello Lutger,
 
 The answer may
 depend on [...]
 the habits of the 'programmers' in question, I don't know.

with someone you can trust. There will never be a usable language that can take in garbage and spit out correct programs.


 
 Do you maintain that a programmer who can't deal with non-nullable
 references without hacking them away is unusually incompetent?

Incompetent? No. But I wouldn't want to hire a programer that *habitually* (and unnecessarily) hacks past a feature designed to prevent bugs. The best race car driver in the world is clearly not incompetent but would still get a ticket on public roads for speeding or following to close.
 I don't
 know about this. Actually I suspect non-nullable references by default
 are in the end safer (whatever that means), but only if they don't
 complicate the use of nullable references.

I'll second that.
Sep 27 2009
parent reply "Manfred_Nowak" <svv1999 hotmail.com> writes:
BCS wrote:

[...]
 I wouldn't want to hire a programer that *habitually* (and
 unnecessarily) hacks past a feature designed to prevent bugs.

In the short time of an interview its not possible to test for habits (or necessarity) to hack past a feature designed to provent bugs. Therefore the only measures of code quality are the number of bugs detected by the users---or the number of WTF's exclaimed during a code review. Are you able to give an upper limit for the number of WTF's during a code review for which the coder is not fired? -manfred
Sep 28 2009
next sibling parent BCS <none anon.com> writes:
Hello Manfred_Nowak,

 BCS wrote:
 
 [...]
 
 I wouldn't want to hire a programer that *habitually* (and
 unnecessarily) hacks past a feature designed to prevent bugs.
 

(or necessarity) to hack past a feature designed to provent bugs.

Good point, I guess that all that is left would be to try and get a feel for what they think of that kind of practice (give them something ugly that works and ask "what do you think of this code?"). If they indicate they think that kind of hacking a bad idea, then at least you can say they lied if you have to get rid of them for that kind of things.
Sep 28 2009
prev sibling parent language_fan <foo bar.com.invalid> writes:
Mon, 28 Sep 2009 20:34:44 +0000, BCS thusly wrote:

 Hello Manfred_Nowak,
 
 BCS wrote:
 
 [...]
 
 I wouldn't want to hire a programer that *habitually* (and
 unnecessarily) hacks past a feature designed to prevent bugs.
 

(or necessarity) to hack past a feature designed to provent bugs.

Good point, I guess that all that is left would be to try and get a feel for what they think of that kind of practice (give them something ugly that works and ask "what do you think of this code?"). If they indicate they think that kind of hacking a bad idea, then at least you can say they lied if you have to get rid of them for that kind of things.

At least in the companies I have worked in they briefly teach you their stuff in 1-7 days and want to see some preliminary results. If you have trouble writing any code, you have lost the job (there is a 6 month test period or something similar so it is perfectly legal to kick him out if he fails). Usually the schedules are tight so hiring a lazy bastard is not worth the effort. Other ways to control the learning are working with a more experienced pair (pair programming - ever heard of it?) and weekly meetings.
Sep 28 2009
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Lutger:

 First, there is a huge demand for programmers, so much that even I got hired 
 in this time of crisis ;) Good programmers don't suddenly fall from the 
 skies apparently. 

This is the nicest thing I've read this week. Thank you very much :-) Biologists aren't that lucky, apparently. Bye, bearophile
Sep 27 2009
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Jesse Phillips:

The thing is that memory safety is the only safety with code.<

Nope. For example in Delphi and C# you can have a runtime integer overflow errors. That's another kind of safety. If you look at safety-critical code, the one Walter was talking about, you see people test code (and compile time) very well, looking for an enormous amount of possible errors. Doing this increases code safety. So you can have ABS brakes, TAC machine in hospitals, automatic pilots and so on. Bye, bearophile
Sep 27 2009
prev sibling next sibling parent reply Rainer Deyke <rainerd eldwood.com> writes:
Jesse Phillips wrote:
 The thing is that memory safety is the only safety with code.

That is such bullshit. For example, this: class A { } class B { } A x = new B; No memory access violation (yet). Clearly incorrect. Detecting this at compile time is clearly a safety feature, and a good one. You could argue that assigned a 'B' to a variable that is declared to hold an 'A' is already a memory safety violation. If so, then the exact argument also applies to assigning 'null' to the same variable. -- Rainer Deyke - rainerd eldwood.com
Sep 27 2009
parent reply Jesse Phillips <jesse.k.phillips+d gmail.com> writes:
Rainer Deyke Wrote:

 You could argue that assigned a 'B' to a variable that is declared to
 hold an 'A' is already a memory safety violation.  

Yeah, it was brought to my attention that "type safety" by a friend could be another form. bearophile also brings up a good example.
If so, then the exact argument also applies to assigning 'null' to the same
variable.

I think that is what Walter is getting at, you're not dealing with memory that is correct, when this happens the program should halt and be dealt with from outside the program.
Sep 28 2009
parent Rainer Deyke <rainerd eldwood.com> writes:
Jesse Phillips wrote:
 Yeah, it was brought to my attention that "type safety" by a friend
 could be another form. bearophile also brings up a good example.

<snip>
 I think that is what Walter is getting at, you're not dealing with
 memory that is correct, when this happens the program should halt and
 be dealt with from outside the program.

Type errors and null pointer errors both belong to the same class of errors, namely variables containing bogus contents. Some languages like Python detect both at runtime. That's fine for those languages. However, I prefer to detect as many errors as possible at compile time, especially for larger projects. Nullable types turn compile time errors into runtime errors which may or may not be detected during testing. In the worst case, nullable types lead to silent data corruption. Consider what happens when a bogus null field is serialized. -- Rainer Deyke - rainerd eldwood.com
Sep 28 2009
prev sibling parent reply Jesse Phillips <jesse.k.phillips+d gmail.com> writes:
language_fan Wrote:

 Now if you really want to throw some sticks into the spokes, you would
 say that if the program crashes due to a null pointer, it is still
 likely that the programmer will just initialize/set the value to a
 "default" that still isn't valid just to get the program to continue to
 run.

Why should it crash in the first place? I hate crashes. You liek them? I can prove by structural induction that you do not like them when you can avoid crashes with static checking.

No one likes programs that crash, doesn't that mean it is an incorrect behavior though?
 Have you ever used functional languages? When you develop in Haskell or 
 SML, how often you feel there is a good change something will be 
 initialized to the wrong value? Can you show some statistics that show 
 how unsafe this practice is?

So isn't that the question? Does/can "default" (by human or machine) initialization create an incorrect state? If it does, do we continue to work as if nothing was wrong or crash? I don't know how often the initialization would be incorrect, but I don't think Walter is concerned with it's frequency, but that it is possible.
Sep 28 2009
next sibling parent Adam Burton <adz21c googlemail.com> writes:
I don't know if what I am about to rant about has already been discussed and 
I haven't noticed, but sometimes I feel like sticking my opinions in and 
this seems to be one of them times :-) so bare with me and we'll see if I am 
a crazy man blabbing on about crap or not :-).

language_fan wrote:

 Mon, 28 Sep 2009 15:35:07 -0400, Jesse Phillips thusly wrote:
 
 language_fan Wrote:
 
 Now if you really want to throw some sticks into the spokes, you
 would say that if the program crashes due to a null pointer, it is
 still likely that the programmer will just initialize/set the value
 to a "default" that still isn't valid just to get the program to
 continue to run.

Why should it crash in the first place? I hate crashes. You liek them? I can prove by structural induction that you do not like them when you can avoid crashes with static checking.

No one likes programs that crash, doesn't that mean it is an incorrect behavior though?
 Have you ever used functional languages? When you develop in Haskell or
 SML, how often you feel there is a good change something will be
 initialized to the wrong value? Can you show some statistics that show
 how unsafe this practice is?

So isn't that the question? Does/can "default" (by human or machine) initialization create an incorrect state?


Foo obj; // Machine default of null right? obj.bar(); // Null pointer exception due to null being bad state for the app Now steps in moron programmer, who would put garbage data into non-nullable vars to init them, to fix the issue Foo obj = new Foo("bleh"); // Fix to avoid null pointer exception (and yes i have seen people do this) obj.bar(); // Logic error but the application soldiers on.
 If it does, do we continue to
 work as if nothing was wrong or crash?


what I mean but I don't see how this is pertinent to the discussion. [1]
 I don't know how often the
 initialization would be incorrect, but I don't think Walter is concerned
 with it's frequency, but that it is possible.


shown its possible to insert garbage into classes, maybe we should drop them too? Also the default machine implementation seemed to screw up too. I think there's a point where you have to trust the human factor to do its job correctly. If the feature was so ridiculously complex (like depending on planetary alignment) that it forced the programmer into stupid practices then fair enough, even if its likely most will get it right then that's a problem with the feature not the programmer (although I would personally say this isn't the case, pending I have understood the feature correctly :-P) ... if that makes sense (so any technical issues, e.g. I believe someone mentioned enforcing it in structs allocated with malloc, are good points that I am just not technical enough to comment on, consider me the casual hobby reader who has an interest, but not a good background, in systems languages). However I think the previous discussions as I remember them seem to assert the programmer is an idiot who will initialize with crap, which I think is just out of the languages control.
 ...
 
 It really depends on your subjective opinion whether you want a program
 to segfault or spot a set of errors statically, and have illegally
 behaving non-crashing programs. I say FFFFFFFFFFUUUUUUUUUUU every time I
 experience a segfault. My hobby programs at home are not that critical,
 and at work the critical code is *proven* to be correct so no need to
 worry there.

it should crash or continue, without a more in depth knowledge its hard to say. Some applications it may be possible to crash a process within itself (so just throw exception) and return the application to a reasonable state that it may continue (like crashing back to the applications main menu and letting you start again). Others apps you may want to kill there and then (but die gracefully, so rollback transactions etc) before they do more harm. Regardless the above 2 arguments of crashing vs continuing and the incompetence of some developers seems to have no baring on non-nullable. Ignoring the fact a function with all non-nullable variables could still crash with a non nullpointerexecption, it seems to me if anything non- nullables just make the application crash earlier when it receives a null where not expected. Consider below implemented "normally". void FuncOne(Foo foo) { .... foo = null; // The bug .... FuncTwo(foo); .... } // Does not expect null void FuncTwo(Foo foo) { foo.bar(); //null pointer exception } Trivial example but consider there are chunks of code you can't see that may also use foo that you would need to investigate to see if they set it to null, so plenty of code paths to search. Now consider with non-nullables. void FuncOne(Foo? foo) { .... foo = null; // The bug .... FuncTwo(enforce(foo)); // [2] .... } // Does not expect null void FuncTwo(Foo foo) { foo.bar(); } [2] Here I am guessing at what people mean by enforce. My assumption is it checks to see if foo is null and throws nullpointerexception if so. Else it lets to application continue executing and also skips the compiler check that we are passing a nullable into a non-nullable. So first off without enforce [2] would have had a compiler error that would have made me investigate this potential bug anyway, whether I should have an alternate code path or more in depth look at the design, but lets assume I think it should never get to that state because its not valid for it to be null at [2] (but it is else where in FuncOne). So on execution we get an exception at [2], so we died earlier than we did in the nullable form. So not only do we have less to search (a lot less, cos not only does the trace give us less but also any other functions that only take non-nullable can remove code paths to check making the search area much smaller, sounds productive), but also we killed the application earlier before it did even more damage (like putting a plane into a dive maybe?). I wanted to point that out because I am sure Walter noticed it moved the error from one place to another but I don't think anyone has pointed out it is a way to identify bad application state earlier (which seems to be the focus for one argument) rather than later (seems to be that eventually code paths that allow nulls sooner or later turn into ones that don't because otherwise the variables are pointless, so by telling the compiler where it turns to a non null path you can get it to trigger the exceptions early which I would think would be inline with crashing the application when there is bad state). I also see non-nullables helping track down potential bugs when changing variables to non-nullable and removing unnecessary code for vice versa. Seems to me non-nullable is in the same sort of area as const/immutable. Where as immutable or const prevent any data changes from happening to stop bad application state, non-nullables prevent null going where its not allowed for bad application state with numerous other productivity benefits. Are these the ramblings of a sleep deprived mad man getting involved with things he doesn't understand? you decide :-).
Sep 28 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
language_fan wrote:
 Mon, 28 Sep 2009 15:35:07 -0400, Jesse Phillips thusly wrote:
 
 language_fan Wrote:

 Now if you really want to throw some sticks into the spokes, you
 would say that if the program crashes due to a null pointer, it is
 still likely that the programmer will just initialize/set the value
 to a "default" that still isn't valid just to get the program to
 continue to run.

I can prove by structural induction that you do not like them when you can avoid crashes with static checking.

behavior though?
 Have you ever used functional languages? When you develop in Haskell or
 SML, how often you feel there is a good change something will be
 initialized to the wrong value? Can you show some statistics that show
 how unsafe this practice is?

initialization create an incorrect state? If it does, do we continue to work as if nothing was wrong or crash? I don't know how often the initialization would be incorrect, but I don't think Walter is concerned with it's frequency, but that it is possible.

Value types can be incorrectly initialized and nobody notices. E.g. int min; foreach(int value; list) if (value < min) min = value; Oops, you forgot to define a flag variable or initialize to int.min

You mean int.max :o). Andrei
Sep 28 2009
next sibling parent Jeremie Pelletier <jeremiep gmail.com> writes:
Andrei Alexandrescu wrote:
 language_fan wrote:
 Mon, 28 Sep 2009 15:35:07 -0400, Jesse Phillips thusly wrote:

 language_fan Wrote:

 Now if you really want to throw some sticks into the spokes, you
 would say that if the program crashes due to a null pointer, it is
 still likely that the programmer will just initialize/set the value
 to a "default" that still isn't valid just to get the program to
 continue to run.

I can prove by structural induction that you do not like them when you can avoid crashes with static checking.

behavior though?
 Have you ever used functional languages? When you develop in Haskell or
 SML, how often you feel there is a good change something will be
 initialized to the wrong value? Can you show some statistics that show
 how unsafe this practice is?

initialization create an incorrect state? If it does, do we continue to work as if nothing was wrong or crash? I don't know how often the initialization would be incorrect, but I don't think Walter is concerned with it's frequency, but that it is possible.

Value types can be incorrectly initialized and nobody notices. E.g. int min; foreach(int value; list) if (value < min) min = value; Oops, you forgot to define a flag variable or initialize to int.min

You mean int.max :o). Andrei

He just proved how enforcing initializers can still cause errors! I didn't even think of that one! :o)
Sep 28 2009
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 28 Sep 2009 19:27:03 -0500, Andrei Alexandrescu wrote:

 language_fan wrote:
 
   int min;
 
   foreach(int value; list)
     if (value < min) min = value;
 
 Oops, you forgot to define a flag variable or initialize to int.min

You mean int.max :o).

if (list.length == 0) throw( some exception); // An empty or null list has no minimum int min = list[0]; foreach(int value; list[1..$]) if (value < min) min = value; I'm still surprised by Walter's stance. For the purposes of this discussion... * Null only applies to the memory address portion of reference types and not to value types. The discussion is not about non-nullable value types. * There are two types of reference types: (1) Those that can be initialized on declaration because the coder knows what to initialize them to; a.k.a. non-nullable. If the coder does not know what to initialize them to at declaration time, then either the design is wrong, the coder doesn't understand the algorithm or application, or it is truly a complex run-time decision. (2) Those that aren't in set (1); a.k.a. nullable. * The standard declaration should imply non-nullable. And if not initialized the compiler should complain. This encourages protection, but does not guarantee it, of course. * To declare a nullable type, use a special syntax to denote that the coder is deliberately choosing to declare a nullable reference. * The compiler will prevent non-nullable types being simply set to null. As D is a system language too, there will be a rare cases that need to subvert this compiler protection, so there will need to be a method to explicitly set a non-nullable type to a null. The point is that such a method should be a visible warning beacon to maintenance coders. Priority should be given to coders that prefer safe coding. If a coder, for whatever reason, chooses to use nullable references or initialize non-nullable reference to rubbish data, then the responsibility is on them to ensure safe applications. Safe coding practices should not be penalized. The C/C++ programming language is inherently "unsafe" in this regard, and that is not news to anyone. The D programming language does not have to follow this paradigm. I'm still not ready to use D for anything, but I watch it in hope. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Sep 28 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Derek Parnell wrote:
 On Mon, 28 Sep 2009 19:27:03 -0500, Andrei Alexandrescu wrote:
 
 language_fan wrote:
   int min;

   foreach(int value; list)
     if (value < min) min = value;

 Oops, you forgot to define a flag variable or initialize to int.min


if (list.length == 0) throw( some exception); // An empty or null list has no minimum int min = list[0]; foreach(int value; list[1..$]) if (value < min) min = value; I'm still surprised by Walter's stance. For the purposes of this discussion... * Null only applies to the memory address portion of reference types and not to value types. The discussion is not about non-nullable value types. * There are two types of reference types: (1) Those that can be initialized on declaration because the coder knows what to initialize them to; a.k.a. non-nullable. If the coder does not know what to initialize them to at declaration time, then either the design is wrong, the coder doesn't understand the algorithm or application, or it is truly a complex run-time decision. (2) Those that aren't in set (1); a.k.a. nullable. * The standard declaration should imply non-nullable. And if not initialized the compiler should complain. This encourages protection, but does not guarantee it, of course. * To declare a nullable type, use a special syntax to denote that the coder is deliberately choosing to declare a nullable reference. * The compiler will prevent non-nullable types being simply set to null. As D is a system language too, there will be a rare cases that need to subvert this compiler protection, so there will need to be a method to explicitly set a non-nullable type to a null. The point is that such a method should be a visible warning beacon to maintenance coders. Priority should be given to coders that prefer safe coding. If a coder, for whatever reason, chooses to use nullable references or initialize non-nullable reference to rubbish data, then the responsibility is on them to ensure safe applications. Safe coding practices should not be penalized. The C/C++ programming language is inherently "unsafe" in this regard, and that is not news to anyone. The D programming language does not have to follow this paradigm.

But it doesn't have to follow the paranoid safety paradigm either. I wouldn't like two reference types and casting between the two when they're essentially the same with one having a single value that can't be set out of 4 billions possibilities. Seems like a waste to me, especially since 3 billions of these possibilities will result in the same segfault crash than that one you're trying to make illegal on nonnull types.
 I'm still not ready to use D for anything, but I watch it in hope.

I'm already using D quite a lot, I don't find null vs nonnull references all that meaningful. Like walter said, you can just make your own nonnull invariant. Here's a very, very simple wrapper, took 10 seconds to write: struct NonNull(C) if(is(C == class)) { C ref; invariant() { assert(ref !is null); } T opDot() { return ref; } } C++ has all sort of pointer wrappers like this one, you don't see a smart pointer feature in the C++ language for the simple reason its widely used and safer. In fact letting the semantics of these pointers up to libraries allow any project to write its custom ones, and quite a lot do. It should be the same for D, I believe its better to implement flow analysis and let the compiler warn you of uninitialized variables (which will solve most nullptr references, the other half being by NonNull!Object fields). The compiler could also provide better tools to build smart wrapper types upon (like force initialization or prevent void initialization, heck even provide a tuple of valid initializers) and let libraries write their own. Jeremie
Sep 29 2009
parent reply Rainer Deyke <rainerd eldwood.com> writes:
Jeremie Pelletier wrote:
 struct NonNull(C) if(is(C == class)) {
     C ref;
     invariant() { assert(ref !is null); }
     T opDot() { return ref; }
 }

This only catches null errors at runtime. The whole point of a non-null type is to catch null errors at compile time. -- Rainer Deyke - rainerd eldwood.com
Sep 29 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Rainer Deyke wrote:
 Jeremie Pelletier wrote:
 struct NonNull(C) if(is(C == class)) {
     C ref;
     invariant() { assert(ref !is null); }
     T opDot() { return ref; }
 }

This only catches null errors at runtime. The whole point of a non-null type is to catch null errors at compile time.

Thats what flow analysis is for, since these are mostly uninitialized variables rather than null ones. Its dead easy to insert null into a nonnull reference, and since you expect the type to never be null its the last thing you're gonna check. If variables are properly initialized, you'll never get null where you don't expect it, and those are checked at compile time too, and work on every type.
Sep 29 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Jeremie Pelletier:

 Its dead easy to insert null into a nonnull reference,

If it's easy to put a null into a nonnull by *mistake*, then that system needs to be designed better.
 and since you 
 expect the type to never be null its the last thing you're gonna check.

I agree, but I think in a well designed system such situations are really uncommon.
 If variables are properly initialized, you'll never get null where you 
 don't expect it, and those are checked at compile time too, and work on 
 every type.

Cyclone is an example of language where there is both flow analysis (in a very C-like language that allows some kinds of gotos too, maybe someone here may read their source code and adapt it to D. [One of the weirder characteristics of open source programs is that hardly anyone ever reads/copies code/solutions from other open source projects; and I don't think those stupid/idiotic differences in OSS licences are enough to justify such behaviours. I think there's also a strong amount of NIH syndrome. So I don't hold my breath for the day when D will start working with mono C# devs to design a better GC that can be tuned and used for both such open source languages/implementations, that have different but not totally different GC needs]) and optional nonnull references (well, pointers). I think Cyclone shows how to design a safer C-like language. And making D safer is simpler than making C safer, despite D is more complex than C. Bye, bearophile
Sep 29 2009
prev sibling parent Rainer Deyke <rainerd eldwood.com> writes:
Jeremie Pelletier wrote:
 Rainer Deyke wrote:
 This only catches null errors at runtime.  The whole point of a non-null
 type is to catch null errors at compile time.

Thats what flow analysis is for, since these are mostly uninitialized variables rather than null ones.

Nitpick: there are no uninitialized variables in D (unless you especially request them). There are explicitly initialized variables and default-initialized variables. I can see the argument for disabling default initialization and requiring explicit initialization. You don't even need flow analysis for that. However, that doesn't address the problem that non-null references are intended to solve. It's still possible to explicitly store a null values in non-null references without the problem being detected at compile time. -- Rainer Deyke - rainerd eldwood.com
Sep 29 2009
prev sibling parent BCS <none anon.com> writes:
Hello Walter,

 The only reasonable thing a program can do if it discovers it is in an
 unknown state is to stop immediately.
 

This whole thread is NOT about what to do on unknown states. It is about using the compiler to statically remove the possibility of one type of unknown state ever happening. If D were to get non-null by default, with optional nullable, then without ASM/union hacks or the like, you can only get a seg-v when you use the non-default nullable type. Given the above (and assuming memory safety), the only possible wrong-data-error left would be where the programmer explicitly places the wrong value in a variable. In my book, that is a non-starter because 1) it can happen now 2) it can happen anywhere, not just at initialization 3) it can't be detected and 4) (assuming a well done syntax) in the cases where the compiler can't validate the code, the lazy thing to do and the correct thing to do (use a nullable type) will be the same.
Sep 27 2009
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Nick Sabalausky wrote:

I agree with you that if the compiler can detect null dereferences at 
compile time, it should.


 Also, by "safe" I presume you mean "memory safe" which means free of 
 memory corruption. Null pointer exceptions are memory safe. A null pointer 
 could be caused by memory corruption, but it cannot *cause* memory 
 corruption.

No, he's using the real meaning of "safe", not the misleadingly-limited "SafeD" version of "safe" (which I'm still convinced is going to get some poor soul into serious trouble from mistakingly thinking their SafeD program is much safer than it really is). Out here in reality, "safe" also means a lack of ability to crash, or at least some level of protection against it.

Memory safety is something that can be guaranteed (presuming the compiler is correctly implemented). There is no way to guarantee that a non-trivial program cannot crash. It's the old halting problem.
 You seem to be under the impression that nothing can be made uncrashable 
 without introducing the possibility of corrupted state. That's hogwash.

I read that statement several times and I still don't understand what it means. BTW, hardware null pointer checking is a safety feature, just like array bounds checking is.
Sep 27 2009
parent reply downs <default_357-line yahoo.de> writes:
Walter Bright wrote:
 Nick Sabalausky wrote:
 
 I agree with you that if the compiler can detect null dereferences at
 compile time, it should.
 
 
 Also, by "safe" I presume you mean "memory safe" which means free of
 memory corruption. Null pointer exceptions are memory safe. A null
 pointer could be caused by memory corruption, but it cannot *cause*
 memory corruption.

No, he's using the real meaning of "safe", not the misleadingly-limited "SafeD" version of "safe" (which I'm still convinced is going to get some poor soul into serious trouble from mistakingly thinking their SafeD program is much safer than it really is). Out here in reality, "safe" also means a lack of ability to crash, or at least some level of protection against it.

Memory safety is something that can be guaranteed (presuming the compiler is correctly implemented). There is no way to guarantee that a non-trivial program cannot crash. It's the old halting problem.

Okay, I'm gonna have to call you out on this one because it's simply incorrect. The halting problem deals with a valid program state - halting. We cannot check if every program halts because halting is an instruction that must be allowed at almost any point in the program. Why do crashes have to be allowed? They're not an allowed instruction! A compiler can be turing complete and still not allow crashes. There is nothing wrong with this, and it has *nothing* to do with the halting problem.
 You seem to be under the impression that nothing can be made
 uncrashable without introducing the possibility of corrupted state.
 That's hogwash.

I read that statement several times and I still don't understand what it means. BTW, hardware null pointer checking is a safety feature, just like array bounds checking is.

PS: You can't convert segfaults into exceptions under Linux, as far as I know.
Sep 27 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
downs wrote:
 Walter Bright wrote:
 Nick Sabalausky wrote:

 I agree with you that if the compiler can detect null dereferences at
 compile time, it should.


 Also, by "safe" I presume you mean "memory safe" which means free of
 memory corruption. Null pointer exceptions are memory safe. A null
 pointer could be caused by memory corruption, but it cannot *cause*
 memory corruption.

misleadingly-limited "SafeD" version of "safe" (which I'm still convinced is going to get some poor soul into serious trouble from mistakingly thinking their SafeD program is much safer than it really is). Out here in reality, "safe" also means a lack of ability to crash, or at least some level of protection against it.

compiler is correctly implemented). There is no way to guarantee that a non-trivial program cannot crash. It's the old halting problem.

Okay, I'm gonna have to call you out on this one because it's simply incorrect. The halting problem deals with a valid program state - halting. We cannot check if every program halts because halting is an instruction that must be allowed at almost any point in the program. Why do crashes have to be allowed? They're not an allowed instruction! A compiler can be turing complete and still not allow crashes. There is nothing wrong with this, and it has *nothing* to do with the halting problem.
 You seem to be under the impression that nothing can be made
 uncrashable without introducing the possibility of corrupted state.
 That's hogwash.

means. BTW, hardware null pointer checking is a safety feature, just like array bounds checking is.

PS: You can't convert segfaults into exceptions under Linux, as far as I know.

How did Jeremie do that? Andrei
Sep 27 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Andrei Alexandrescu wrote:
 downs wrote:
 Walter Bright wrote:
 Nick Sabalausky wrote:

 I agree with you that if the compiler can detect null dereferences at
 compile time, it should.


 Also, by "safe" I presume you mean "memory safe" which means free of
 memory corruption. Null pointer exceptions are memory safe. A null
 pointer could be caused by memory corruption, but it cannot *cause*
 memory corruption.

misleadingly-limited "SafeD" version of "safe" (which I'm still convinced is going to get some poor soul into serious trouble from mistakingly thinking their SafeD program is much safer than it really is). Out here in reality, "safe" also means a lack of ability to crash, or at least some level of protection against it.

compiler is correctly implemented). There is no way to guarantee that a non-trivial program cannot crash. It's the old halting problem.

Okay, I'm gonna have to call you out on this one because it's simply incorrect. The halting problem deals with a valid program state - halting. We cannot check if every program halts because halting is an instruction that must be allowed at almost any point in the program. Why do crashes have to be allowed? They're not an allowed instruction! A compiler can be turing complete and still not allow crashes. There is nothing wrong with this, and it has *nothing* to do with the halting problem.
 You seem to be under the impression that nothing can be made
 uncrashable without introducing the possibility of corrupted state.
 That's hogwash.

means. BTW, hardware null pointer checking is a safety feature, just like array bounds checking is.

PS: You can't convert segfaults into exceptions under Linux, as far as I know.

How did Jeremie do that? Andrei

A signal handler with the undocumented kernel parameters attaches the signal context to the exception object, repairs the stack frame forged by the kernel to make us believe we called the handler ourselves, does a backtrace right away and attaches it to the exception object, and then throw it. The error handling code will unwind down to the runtime's main() where a catch clause is waiting for any Throwables, sending them back into the unhandled exception handler, and a crash window appears with the backtrace, all finally blocks executed, and gracefully shutting down. All I need to do is an ELF/DWARF reader to extract symbolic debug info under linux, its already working for PE/CodeView on windows. Jeremie
Sep 27 2009
next sibling parent reply Yigal Chripun <yigal100 gmail.com> writes:
On 27/09/2009 19:29, Jeremie Pelletier wrote:
 Andrei Alexandrescu wrote:
 downs wrote:
 Walter Bright wrote:
 Nick Sabalausky wrote:

 I agree with you that if the compiler can detect null dereferences at
 compile time, it should.


 Also, by "safe" I presume you mean "memory safe" which means free of
 memory corruption. Null pointer exceptions are memory safe. A null
 pointer could be caused by memory corruption, but it cannot *cause*
 memory corruption.

misleadingly-limited "SafeD" version of "safe" (which I'm still convinced is going to get some poor soul into serious trouble from mistakingly thinking their SafeD program is much safer than it really is). Out here in reality, "safe" also means a lack of ability to crash, or at least some level of protection against it.

compiler is correctly implemented). There is no way to guarantee that a non-trivial program cannot crash. It's the old halting problem.

Okay, I'm gonna have to call you out on this one because it's simply incorrect. The halting problem deals with a valid program state - halting. We cannot check if every program halts because halting is an instruction that must be allowed at almost any point in the program. Why do crashes have to be allowed? They're not an allowed instruction! A compiler can be turing complete and still not allow crashes. There is nothing wrong with this, and it has *nothing* to do with the halting problem.
 You seem to be under the impression that nothing can be made
 uncrashable without introducing the possibility of corrupted state.
 That's hogwash.

what it means. BTW, hardware null pointer checking is a safety feature, just like array bounds checking is.

PS: You can't convert segfaults into exceptions under Linux, as far as I know.

How did Jeremie do that? Andrei

A signal handler with the undocumented kernel parameters attaches the signal context to the exception object, repairs the stack frame forged by the kernel to make us believe we called the handler ourselves, does a backtrace right away and attaches it to the exception object, and then throw it. The error handling code will unwind down to the runtime's main() where a catch clause is waiting for any Throwables, sending them back into the unhandled exception handler, and a crash window appears with the backtrace, all finally blocks executed, and gracefully shutting down. All I need to do is an ELF/DWARF reader to extract symbolic debug info under linux, its already working for PE/CodeView on windows. Jeremie

Is this Linux specific? what about other *nix systems, like BSD and solaris?
Sep 27 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Yigal Chripun wrote:
 On 27/09/2009 19:29, Jeremie Pelletier wrote:
 Andrei Alexandrescu wrote:
 downs wrote:
 Walter Bright wrote:
 Nick Sabalausky wrote:

 I agree with you that if the compiler can detect null dereferences at
 compile time, it should.


 Also, by "safe" I presume you mean "memory safe" which means free of
 memory corruption. Null pointer exceptions are memory safe. A null
 pointer could be caused by memory corruption, but it cannot *cause*
 memory corruption.

misleadingly-limited "SafeD" version of "safe" (which I'm still convinced is going to get some poor soul into serious trouble from mistakingly thinking their SafeD program is much safer than it really is). Out here in reality, "safe" also means a lack of ability to crash, or at least some level of protection against it.

compiler is correctly implemented). There is no way to guarantee that a non-trivial program cannot crash. It's the old halting problem.

Okay, I'm gonna have to call you out on this one because it's simply incorrect. The halting problem deals with a valid program state - halting. We cannot check if every program halts because halting is an instruction that must be allowed at almost any point in the program. Why do crashes have to be allowed? They're not an allowed instruction! A compiler can be turing complete and still not allow crashes. There is nothing wrong with this, and it has *nothing* to do with the halting problem.
 You seem to be under the impression that nothing can be made
 uncrashable without introducing the possibility of corrupted state.
 That's hogwash.

what it means. BTW, hardware null pointer checking is a safety feature, just like array bounds checking is.

PS: You can't convert segfaults into exceptions under Linux, as far as I know.

How did Jeremie do that? Andrei

A signal handler with the undocumented kernel parameters attaches the signal context to the exception object, repairs the stack frame forged by the kernel to make us believe we called the handler ourselves, does a backtrace right away and attaches it to the exception object, and then throw it. The error handling code will unwind down to the runtime's main() where a catch clause is waiting for any Throwables, sending them back into the unhandled exception handler, and a crash window appears with the backtrace, all finally blocks executed, and gracefully shutting down. All I need to do is an ELF/DWARF reader to extract symbolic debug info under linux, its already working for PE/CodeView on windows. Jeremie

Is this Linux specific? what about other *nix systems, like BSD and solaris?

Signal handler are standard to most *nix platforms since they're part of the posix C standard libraries, maybe some platforms will require a special handling but nothing impossible to do.
Sep 27 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jeremie Pelletier wrote:
 Is this Linux specific? what about other *nix systems, like BSD and 
 solaris?

Signal handler are standard to most *nix platforms since they're part of the posix C standard libraries, maybe some platforms will require a special handling but nothing impossible to do.

Let me write a message on behalf of Sean Kelly. He wrote that to Walter and myself this morning, then I suggested him to post it but probably he is off email for a short while. Hopefully the community will find a solution to the issue he's raising. Let me post this: =================== Sean Kelly wrote: There's one minor problem with his code. It's not safe to throw an exception from a signal handler. Here's a quote from the POSIX spec at opengroup.org: "In order to prevent errors arising from interrupting non-reentrant function calls, applications should protect calls to these functions either by blocking the appropriate signals or through the use of some programmatic semaphore (see semget() , sem_init() , sem_open() , and so on). Note in particular that even the "safe" functions may modify errno; the signal-catching function, if not executing as an independent thread, may want to save and restore its value. Naturally, the same principles apply to the reentrancy of application routines and asynchronous data access. Note thatlongjmp() and siglongjmp() are not in the list of reentrant functions. This is because the code executing after longjmp() and siglongjmp() can call any unsafe functions with the same danger as calling those unsafe functions directly from the signal handler. Applications that use longjmp() andsiglongjmp() from within signal handlers require rigorous protection in order to be portable." If this were an acceptable approach it would have been in druntime ages ago :-) =================== Andrei
Sep 27 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Andrei Alexandrescu wrote:
 Jeremie Pelletier wrote:
 Is this Linux specific? what about other *nix systems, like BSD and 
 solaris?

Signal handler are standard to most *nix platforms since they're part of the posix C standard libraries, maybe some platforms will require a special handling but nothing impossible to do.

Let me write a message on behalf of Sean Kelly. He wrote that to Walter and myself this morning, then I suggested him to post it but probably he is off email for a short while. Hopefully the community will find a solution to the issue he's raising. Let me post this: =================== Sean Kelly wrote: There's one minor problem with his code. It's not safe to throw an exception from a signal handler. Here's a quote from the POSIX spec at opengroup.org: "In order to prevent errors arising from interrupting non-reentrant function calls, applications should protect calls to these functions either by blocking the appropriate signals or through the use of some programmatic semaphore (see semget() , sem_init() , sem_open() , and so on). Note in particular that even the "safe" functions may modify errno; the signal-catching function, if not executing as an independent thread, may want to save and restore its value. Naturally, the same principles apply to the reentrancy of application routines and asynchronous data access. Note thatlongjmp() and siglongjmp() are not in the list of reentrant functions. This is because the code executing after longjmp() and siglongjmp() can call any unsafe functions with the same danger as calling those unsafe functions directly from the signal handler. Applications that use longjmp() andsiglongjmp() from within signal handlers require rigorous protection in order to be portable." If this were an acceptable approach it would have been in druntime ages ago :-) =================== Andrei

Yes but the segfault signal handler is not made to design code that can live with these exceptions, its just a feature to allow segfaults to be sent to the crash handler to get a backtrace dump. Even on windows while you can recover from access violations, its generally a bad idea to allow for bugs to be turned into features. Jeremie
Sep 27 2009
parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Jeremie Pelletier (jeremiep gmail.com)'s article
 Andrei Alexandrescu wrote:
 Jeremie Pelletier wrote:
 Is this Linux specific? what about other *nix systems, like BSD and
 solaris?

Signal handler are standard to most *nix platforms since they're part of the posix C standard libraries, maybe some platforms will require a special handling but nothing impossible to do.

Let me write a message on behalf of Sean Kelly. He wrote that to Walter and myself this morning, then I suggested him to post it but probably he is off email for a short while. Hopefully the community will find a solution to the issue he's raising. Let me post this: =================== Sean Kelly wrote: There's one minor problem with his code. It's not safe to throw an exception from a signal handler. Here's a quote from the POSIX spec at opengroup.org: "In order to prevent errors arising from interrupting non-reentrant function calls, applications should protect calls to these functions either by blocking the appropriate signals or through the use of some programmatic semaphore (see semget() , sem_init() , sem_open() , and so on). Note in particular that even the "safe" functions may modify errno; the signal-catching function, if not executing as an independent thread, may want to save and restore its value. Naturally, the same principles apply to the reentrancy of application routines and asynchronous data access. Note thatlongjmp() and siglongjmp() are not in the list of reentrant functions. This is because the code executing after longjmp() and siglongjmp() can call any unsafe functions with the same danger as calling those unsafe functions directly from the signal handler. Applications that use longjmp() andsiglongjmp() from within signal handlers require rigorous protection in order to be portable." If this were an acceptable approach it would have been in druntime ages ago :-) ===================

Yes but the segfault signal handler is not made to design code that can live with these exceptions, its just a feature to allow segfaults to be sent to the crash handler to get a backtrace dump. Even on windows while you can recover from access violations, its generally a bad idea to allow for bugs to be turned into features.

I don't think it's fair to compare Windows to Unix here because, as far as I know, Windows (ie. Win32, etc) was built with exceptions in mind (thanks to SEH), while Unix was not. So while the Windows kernel may theoretically be fine with an exception being thrown from within kernel code, this isn't true of Unix. It's true that as long as only Errors are thrown (and thus that the app intends to terminate), things aren't as bad as they could be. Worst case, some mutex in libc is left locked or in some weird state and code executed during stack unwinding or when trying to report the error causes the app to hang instead of terminate. And this risk is somewhat mitigated because I'd expect most of these errors to occur within user code anyway. One thing I'm not entirely sure about is whether the signal handler will always have a valid, C-style call stack tracing back into user code. These errors are triggered by hardware, and I really don't know what kind of tricks are common at that level of OS code. longjmp() doesn't have this problem because it doesn't care about the call stack--it just swaps some registers and executes a JMP. I don't suppose anyone here knows more about the feasibility of throwing exceptions from signal handlers at all? I'll ask around some OS groups and see what people say.
Sep 29 2009
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Sean Kelly (sean invisibleduck.org)'s article
 One thing I'm not entirely sure about is whether the signal handler will always
 have a valid, C-style call stack tracing back into user code.  These errors are
 triggered by hardware, and I really don't know what kind of tricks are common
 at that level of OS code.  longjmp() doesn't have this problem because it
doesn't
 care about the call stack--it just swaps some registers and executes a JMP.  I
 don't suppose anyone here knows more about the feasibility of throwing
 exceptions from signal handlers at all?  I'll ask around some OS groups and
 see what people say.

I was right, it is illegal to throw an exception from a signal handler. And worse, it's illegal to call malloc from a signal handler, so you can't safely create an exception object anyway. Heck, I'm not sure it's even safe to perform IO from a signal handler, so tracing directly from within the handler won't even work reliably. In short, while I'm totally fine with people using this in their own code, it's too unreliable to make an "official" solution by adding it to Druntime.
Sep 29 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Sean Kelly wrote:
 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 One thing I'm not entirely sure about is whether the signal handler will always
 have a valid, C-style call stack tracing back into user code.  These errors are
 triggered by hardware, and I really don't know what kind of tricks are common
 at that level of OS code.  longjmp() doesn't have this problem because it
doesn't
 care about the call stack--it just swaps some registers and executes a JMP.  I
 don't suppose anyone here knows more about the feasibility of throwing
 exceptions from signal handlers at all?  I'll ask around some OS groups and
 see what people say.

I was right, it is illegal to throw an exception from a signal handler. And worse, it's illegal to call malloc from a signal handler, so you can't safely create an exception object anyway. Heck, I'm not sure it's even safe to perform IO from a signal handler, so tracing directly from within the handler won't even work reliably. In short, while I'm totally fine with people using this in their own code, it's too unreliable to make an "official" solution by adding it to Druntime.

Weird, it works just fine for me. Maybe its because the exception is always caught in the thread's entry point, i never tried to let such an exception unwind past the entry point. I haven't tried malloc or any I/O either. There still should be a way to grab the backtrace and context data from the hidden ucontext_* parameter and do something with it after returning from the signal handler. The whole idea of a crash handler is to limit the number of times you need to do postmortem debugging after a crash, or launch the process again within the debugger.
Sep 29 2009
parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Jeremie Pelletier (jeremiep gmail.com)'s article
 Sean Kelly wrote:
 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 One thing I'm not entirely sure about is whether the signal handler will always
 have a valid, C-style call stack tracing back into user code.  These errors are
 triggered by hardware, and I really don't know what kind of tricks are common
 at that level of OS code.  longjmp() doesn't have this problem because it
doesn't
 care about the call stack--it just swaps some registers and executes a JMP.  I
 don't suppose anyone here knows more about the feasibility of throwing
 exceptions from signal handlers at all?  I'll ask around some OS groups and
 see what people say.

I was right, it is illegal to throw an exception from a signal handler. And worse, it's illegal to call malloc from a signal handler, so you can't safely create an exception object anyway. Heck, I'm not sure it's even safe to perform IO from a signal handler, so tracing directly from within the handler won't even work reliably. In short, while I'm totally fine with people using this in their own code, it's too unreliable to make an "official" solution by adding it to Druntime.

always caught in the thread's entry point, i never tried to let such an exception unwind past the entry point. I haven't tried malloc or any I/O either.

I think in practice, the issue is simply that malloc and IO routines aren't on the list of reentrant functions, so if a signal is called from within one of these routines then the signal handler trying to call the same routine could cause Bad Things to happen. This actually comes up in our GC code on Linux because threads are suspended for the collection via signals. If one of these threads is suspended within a non-reentrant library routine and the GC code calls the same routine it can crash or deadlock on an internal mutex (the latter actually happened on OSX until I changed how GC works there). This is kind of a weird issue, since in this case any thread can screw with the GC thread, even though the GC thread itself never enters a signal handler. This is something that never occurred to me before--it was Fawzi that figured out why OSX apps were deadlocking for no reason whatsoever (I *think* this was pre-Druntime, though I can't recall precisely). In short, you may never actually run into a problem using these functions, and if they work for you then that's all that matters. I'm just hesitant to roll something into Druntime that is "undefined" according to a spec and has only been verified to work through experimentation by a subset of D users. ie. I'd rather Druntime be a tad gimped and always work than be super fancy and not work for some people. YMMV.
 There still should be a way to grab the backtrace and context data from
 the hidden ucontext_* parameter and do something with it after returning
 from the signal handler.

Yeah, I saw one suggestion that you could have a thread blocked waiting for (in this case) backtrace data. So another thread could do the trace and no worries about signal handler limitations. Still, this seems like a pretty heavyweight approach. If there were some way to cache the trace data and then have the same thread process it I'd love to know how. I ran into this "can't throw exceptions from a signal handler" issue at a previous job, and finally gave up on the idea in frustration after not being able to come up with a decent workaround.
 The whole idea of a crash handler is to limit the number of times you
 need to do postmortem debugging after a crash, or launch the process
 again within the debugger.

Yup. And as a server programmer, I think getting backtraces within a log file is totally awesome, since dealing with a core dump is difficult at best for such apps. In fact I'd probably use your approach within my own code, since it seems to work.
Sep 29 2009
parent Jeremie Pelletier <jeremiep gmail.com> writes:
Sean Kelly wrote:
 == Quote from Jeremie Pelletier (jeremiep gmail.com)'s article
 Sean Kelly wrote:
 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 One thing I'm not entirely sure about is whether the signal handler will always
 have a valid, C-style call stack tracing back into user code.  These errors are
 triggered by hardware, and I really don't know what kind of tricks are common
 at that level of OS code.  longjmp() doesn't have this problem because it
doesn't
 care about the call stack--it just swaps some registers and executes a JMP.  I
 don't suppose anyone here knows more about the feasibility of throwing
 exceptions from signal handlers at all?  I'll ask around some OS groups and
 see what people say.

it's illegal to call malloc from a signal handler, so you can't safely create an exception object anyway. Heck, I'm not sure it's even safe to perform IO from a signal handler, so tracing directly from within the handler won't even work reliably. In short, while I'm totally fine with people using this in their own code, it's too unreliable to make an "official" solution by adding it to Druntime.

always caught in the thread's entry point, i never tried to let such an exception unwind past the entry point. I haven't tried malloc or any I/O either.

I think in practice, the issue is simply that malloc and IO routines aren't on the list of reentrant functions, so if a signal is called from within one of these routines then the signal handler trying to call the same routine could cause Bad Things to happen. This actually comes up in our GC code on Linux because threads are suspended for the collection via signals. If one of these threads is suspended within a non-reentrant library routine and the GC code calls the same routine it can crash or deadlock on an internal mutex (the latter actually happened on OSX until I changed how GC works there). This is kind of a weird issue, since in this case any thread can screw with the GC thread, even though the GC thread itself never enters a signal handler. This is something that never occurred to me before--it was Fawzi that figured out why OSX apps were deadlocking for no reason whatsoever (I *think* this was pre-Druntime, though I can't recall precisely). In short, you may never actually run into a problem using these functions, and if they work for you then that's all that matters. I'm just hesitant to roll something into Druntime that is "undefined" according to a spec and has only been verified to work through experimentation by a subset of D users. ie. I'd rather Druntime be a tad gimped and always work than be super fancy and not work for some people. YMMV.

I agree, I don't mind occasional crashes within the crash handler itself if it ever comes to that, at this point things are already going pretty bad anyways and the process is already going to exit soon enough. It could be confusing as hell to library users if they don't know this might happen in rare cases, so I understand keeping it away from Druntime until a proven solution is found.
 There still should be a way to grab the backtrace and context data from
 the hidden ucontext_* parameter and do something with it after returning
 from the signal handler.

Yeah, I saw one suggestion that you could have a thread blocked waiting for (in this case) backtrace data. So another thread could do the trace and no worries about signal handler limitations. Still, this seems like a pretty heavyweight approach.

Eh, I'm not going that way either :) Maybe spawn another process with some basic infos collected by the signal handler (ie registers, loaded modules and backtrace) and let that other process deal with generating a crash window while we gracefully shut down with a core dump. That's also a heavyweight idea but its only happening after a crash, not while waiting for it.
 If there were some way to cache the trace data and then have the same
 thread process it I'd love to know how.  I ran into this "can't throw
 exceptions from a signal handler" issue at a previous job, and finally
 gave up on the idea in frustration after not being able to come up with
 a decent workaround.
 
 The whole idea of a crash handler is to limit the number of times you
 need to do postmortem debugging after a crash, or launch the process
 again within the debugger.

Yup. And as a server programmer, I think getting backtraces within a log file is totally awesome, since dealing with a core dump is difficult at best for such apps. In fact I'd probably use your approach within my own code, since it seems to work.

Yeah I'm not much into post-mortem debugging either, I like running within the debugger or having a convenient crash window. It's also neat thing to use when you distribute your executable since you can implement a smtp mailer for the crash reports instead of the crash window.
Sep 29 2009
prev sibling parent Jeremie Pelletier <jeremiep gmail.com> writes:
Sean Kelly wrote:
 == Quote from Jeremie Pelletier (jeremiep gmail.com)'s article
 Andrei Alexandrescu wrote:
 Jeremie Pelletier wrote:
 Is this Linux specific? what about other *nix systems, like BSD and
 solaris?

of the posix C standard libraries, maybe some platforms will require a special handling but nothing impossible to do.

and myself this morning, then I suggested him to post it but probably he is off email for a short while. Hopefully the community will find a solution to the issue he's raising. Let me post this: =================== Sean Kelly wrote: There's one minor problem with his code. It's not safe to throw an exception from a signal handler. Here's a quote from the POSIX spec at opengroup.org: "In order to prevent errors arising from interrupting non-reentrant function calls, applications should protect calls to these functions either by blocking the appropriate signals or through the use of some programmatic semaphore (see semget() , sem_init() , sem_open() , and so on). Note in particular that even the "safe" functions may modify errno; the signal-catching function, if not executing as an independent thread, may want to save and restore its value. Naturally, the same principles apply to the reentrancy of application routines and asynchronous data access. Note thatlongjmp() and siglongjmp() are not in the list of reentrant functions. This is because the code executing after longjmp() and siglongjmp() can call any unsafe functions with the same danger as calling those unsafe functions directly from the signal handler. Applications that use longjmp() andsiglongjmp() from within signal handlers require rigorous protection in order to be portable." If this were an acceptable approach it would have been in druntime ages ago :-) ===================

live with these exceptions, its just a feature to allow segfaults to be sent to the crash handler to get a backtrace dump. Even on windows while you can recover from access violations, its generally a bad idea to allow for bugs to be turned into features.

I don't think it's fair to compare Windows to Unix here because, as far as I know, Windows (ie. Win32, etc) was built with exceptions in mind (thanks to SEH), while Unix was not. So while the Windows kernel may theoretically be fine with an exception being thrown from within kernel code, this isn't true of Unix. It's true that as long as only Errors are thrown (and thus that the app intends to terminate), things aren't as bad as they could be. Worst case, some mutex in libc is left locked or in some weird state and code executed during stack unwinding or when trying to report the error causes the app to hang instead of terminate. And this risk is somewhat mitigated because I'd expect most of these errors to occur within user code anyway. One thing I'm not entirely sure about is whether the signal handler will always have a valid, C-style call stack tracing back into user code. These errors are triggered by hardware, and I really don't know what kind of tricks are common at that level of OS code. longjmp() doesn't have this problem because it doesn't care about the call stack--it just swaps some registers and executes a JMP. I don't suppose anyone here knows more about the feasibility of throwing exceptions from signal handlers at all? I'll ask around some OS groups and see what people say.

I haven't had any problems so far, the stack trace generated was always valid and similar to what gdb would output. But I agree that trying to recover from these exceptions is a *bad* idea in so many ways. From what I know, the kernel alters the stack frame of the signal handler to make us believe we called it ourselves. Returning from the signal handler therefore jumps to the routine from which the signal was originally raised, without the kernel being aware of it. This is a bit different than how SEH is handled, but has a lot in common to it: From the research I did about SEH internals, its just built on top of interrupt handlers. The hardware raises an exception (access violation, etc), jumps into a kernel handler for the corresponding interrupt, it there looks up the base of the stack for a pointer to a struct containing a handler function and a handler table which is set and restored by try blocks and calls the exception handler (_d_framehandler in our case) with the appropriate parameters. From there the kernel decides what to do based on the return code of the framehandler. The signal handler model is therefore quite acceptable to build exception handling on top of. We just may want to also manually generate a core dump before throwing the exception to support postmortem debugging.
Sep 29 2009
prev sibling parent reply downs <default_357-line yahoo.de> writes:
Jeremie Pelletier wrote:
 Andrei Alexandrescu wrote:
 downs wrote:
 Walter Bright wrote:
 Nick Sabalausky wrote:

 I agree with you that if the compiler can detect null dereferences at
 compile time, it should.


 Also, by "safe" I presume you mean "memory safe" which means free of
 memory corruption. Null pointer exceptions are memory safe. A null
 pointer could be caused by memory corruption, but it cannot *cause*
 memory corruption.

misleadingly-limited "SafeD" version of "safe" (which I'm still convinced is going to get some poor soul into serious trouble from mistakingly thinking their SafeD program is much safer than it really is). Out here in reality, "safe" also means a lack of ability to crash, or at least some level of protection against it.

compiler is correctly implemented). There is no way to guarantee that a non-trivial program cannot crash. It's the old halting problem.

Okay, I'm gonna have to call you out on this one because it's simply incorrect. The halting problem deals with a valid program state - halting. We cannot check if every program halts because halting is an instruction that must be allowed at almost any point in the program. Why do crashes have to be allowed? They're not an allowed instruction! A compiler can be turing complete and still not allow crashes. There is nothing wrong with this, and it has *nothing* to do with the halting problem.
 You seem to be under the impression that nothing can be made
 uncrashable without introducing the possibility of corrupted state.
 That's hogwash.

what it means. BTW, hardware null pointer checking is a safety feature, just like array bounds checking is.

PS: You can't convert segfaults into exceptions under Linux, as far as I know.

How did Jeremie do that? Andrei

A signal handler with the undocumented kernel parameters attaches the signal context to the exception object, repairs the stack frame forged by the kernel to make us believe we called the handler ourselves, does a backtrace right away and attaches it to the exception object, and then throw it. The error handling code will unwind down to the runtime's main() where a catch clause is waiting for any Throwables, sending them back into the unhandled exception handler, and a crash window appears with the backtrace, all finally blocks executed, and gracefully shutting down. All I need to do is an ELF/DWARF reader to extract symbolic debug info under linux, its already working for PE/CodeView on windows. Jeremie

Woah, nice. I stand corrected. Is this in druntime already?
Sep 27 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
downs wrote:
 Jeremie Pelletier wrote:
 Andrei Alexandrescu wrote:
 downs wrote:
 Walter Bright wrote:
 Nick Sabalausky wrote:

 I agree with you that if the compiler can detect null dereferences at
 compile time, it should.


 Also, by "safe" I presume you mean "memory safe" which means free of
 memory corruption. Null pointer exceptions are memory safe. A null
 pointer could be caused by memory corruption, but it cannot *cause*
 memory corruption.

misleadingly-limited "SafeD" version of "safe" (which I'm still convinced is going to get some poor soul into serious trouble from mistakingly thinking their SafeD program is much safer than it really is). Out here in reality, "safe" also means a lack of ability to crash, or at least some level of protection against it.

compiler is correctly implemented). There is no way to guarantee that a non-trivial program cannot crash. It's the old halting problem.

incorrect. The halting problem deals with a valid program state - halting. We cannot check if every program halts because halting is an instruction that must be allowed at almost any point in the program. Why do crashes have to be allowed? They're not an allowed instruction! A compiler can be turing complete and still not allow crashes. There is nothing wrong with this, and it has *nothing* to do with the halting problem.
 You seem to be under the impression that nothing can be made
 uncrashable without introducing the possibility of corrupted state.
 That's hogwash.

what it means. BTW, hardware null pointer checking is a safety feature, just like array bounds checking is.

as I know.

Andrei

signal context to the exception object, repairs the stack frame forged by the kernel to make us believe we called the handler ourselves, does a backtrace right away and attaches it to the exception object, and then throw it. The error handling code will unwind down to the runtime's main() where a catch clause is waiting for any Throwables, sending them back into the unhandled exception handler, and a crash window appears with the backtrace, all finally blocks executed, and gracefully shutting down. All I need to do is an ELF/DWARF reader to extract symbolic debug info under linux, its already working for PE/CodeView on windows. Jeremie

Woah, nice. I stand corrected. Is this in druntime already?

Not yet, its part of a custom runtime I'm working on and wish to release under a public domain license when I get the time. The code is linked from a thread in D.announce.
Sep 27 2009
parent grauzone <none example.net> writes:
Jeremie Pelletier wrote:
 downs wrote:
 Jeremie Pelletier wrote:
 Andrei Alexandrescu wrote:
 downs wrote:
 Walter Bright wrote:
 Nick Sabalausky wrote:

 I agree with you that if the compiler can detect null dereferences at
 compile time, it should.


 Also, by "safe" I presume you mean "memory safe" which means 
 free of
 memory corruption. Null pointer exceptions are memory safe. A null
 pointer could be caused by memory corruption, but it cannot *cause*
 memory corruption.

misleadingly-limited "SafeD" version of "safe" (which I'm still convinced is going to get some poor soul into serious trouble from mistakingly thinking their SafeD program is much safer than it really is). Out here in reality, "safe" also means a lack of ability to crash, or at least some level of protection against it.

compiler is correctly implemented). There is no way to guarantee that a non-trivial program cannot crash. It's the old halting problem.

incorrect. The halting problem deals with a valid program state - halting. We cannot check if every program halts because halting is an instruction that must be allowed at almost any point in the program. Why do crashes have to be allowed? They're not an allowed instruction! A compiler can be turing complete and still not allow crashes. There is nothing wrong with this, and it has *nothing* to do with the halting problem.
 You seem to be under the impression that nothing can be made
 uncrashable without introducing the possibility of corrupted state.
 That's hogwash.

what it means. BTW, hardware null pointer checking is a safety feature, just like array bounds checking is.

as I know.

Andrei

signal context to the exception object, repairs the stack frame forged by the kernel to make us believe we called the handler ourselves, does a backtrace right away and attaches it to the exception object, and then throw it. The error handling code will unwind down to the runtime's main() where a catch clause is waiting for any Throwables, sending them back into the unhandled exception handler, and a crash window appears with the backtrace, all finally blocks executed, and gracefully shutting down. All I need to do is an ELF/DWARF reader to extract symbolic debug info under linux, its already working for PE/CodeView on windows. Jeremie

Woah, nice. I stand corrected. Is this in druntime already?

Not yet, its part of a custom runtime I'm working on and wish to release under a public domain license when I get the time. The code is linked from a thread in D.announce.

Some of this functionality is also in Tango (SVN version). Signals are catched only to print a backtrace.
Sep 27 2009
prev sibling parent BCS <none anon.com> writes:
Hello downs,

 PS: You can't convert segfaults into exceptions under Linux, as far as
 I know.
 

Last I checked, throwing from a signal handler works on linux.
Sep 27 2009
prev sibling parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
On 27-9-2009 9:20, Walter Bright wrote:
 language_fan wrote:
 The idea behind non-nullable types and other contracts is to catch
 these errors on compile time. Sure, the code is a bit harder to write,
 but it is safe and never segfaults. The idea is to minimize the amount
 of runtime errors of all sorts. That's also how other features of
 statically typed languages work.

I certainly agree that catching errors at compile time is preferable by far. Where I disagree is the notion that non-nullable types achieve this. I've argued extensively here that they hide errors, not fix them. Also, by "safe" I presume you mean "memory safe" which means free of memory corruption. Null pointer exceptions are memory safe. A null pointer could be caused by memory corruption, but it cannot *cause* memory corruption.

// t.d void main() { int* a; a[20000] = 2; } [C:\Users\Lionello] dmd -run t.d [C:\Users\Lionello] This code passes on Vista. Granted, needs a big enough offset and some luck, but indexing null will never be secure in the current flat memory models. L.
Sep 27 2009
parent reply Max Samukha <spambox d-coding.com> writes:
Lionello Lunesu wrote:

 On 27-9-2009 9:20, Walter Bright wrote:
 language_fan wrote:
 The idea behind non-nullable types and other contracts is to catch
 these errors on compile time. Sure, the code is a bit harder to write,
 but it is safe and never segfaults. The idea is to minimize the amount
 of runtime errors of all sorts. That's also how other features of
 statically typed languages work.

I certainly agree that catching errors at compile time is preferable by far. Where I disagree is the notion that non-nullable types achieve this. I've argued extensively here that they hide errors, not fix them. Also, by "safe" I presume you mean "memory safe" which means free of memory corruption. Null pointer exceptions are memory safe. A null pointer could be caused by memory corruption, but it cannot *cause* memory corruption.

// t.d void main() { int* a; a[20000] = 2; } [C:\Users\Lionello] dmd -run t.d [C:\Users\Lionello] This code passes on Vista. Granted, needs a big enough offset and some luck, but indexing null will never be secure in the current flat memory models. L.

That is a strong argument. If an object is big enough, modifying it via a null reference may still cause memory corruption. Initializing references to null does not guarantee memory safety.
Sep 28 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Max Samukha wrote:
 Lionello Lunesu wrote:
 
 On 27-9-2009 9:20, Walter Bright wrote:
 language_fan wrote:
 The idea behind non-nullable types and other contracts is to catch
 these errors on compile time. Sure, the code is a bit harder to write,
 but it is safe and never segfaults. The idea is to minimize the amount
 of runtime errors of all sorts. That's also how other features of
 statically typed languages work.

I certainly agree that catching errors at compile time is preferable by far. Where I disagree is the notion that non-nullable types achieve this. I've argued extensively here that they hide errors, not fix them. Also, by "safe" I presume you mean "memory safe" which means free of memory corruption. Null pointer exceptions are memory safe. A null pointer could be caused by memory corruption, but it cannot *cause* memory corruption.

void main() { int* a; a[20000] = 2; } [C:\Users\Lionello] dmd -run t.d [C:\Users\Lionello] This code passes on Vista. Granted, needs a big enough offset and some luck, but indexing null will never be secure in the current flat memory models. L.

That is a strong argument. If an object is big enough, modifying it via a null reference may still cause memory corruption. Initializing references to null does not guarantee memory safety.

How is that corruption? These pointers were purposely set to 0x00000002, corruption I believe is when memory is modified without the programmer being aware of it. For example if the GC was to free memory that is still reachable, that would cause corruption. Corruption is near impossible to trace back, this case is trivial.
Sep 28 2009
parent Lionello Lunesu <lio lunesu.remove.com> writes:
On 28-9-2009 18:09, Jeremie Pelletier wrote:
 Max Samukha wrote:
 Lionello Lunesu wrote:

 On 27-9-2009 9:20, Walter Bright wrote:
 language_fan wrote:
 The idea behind non-nullable types and other contracts is to catch
 these errors on compile time. Sure, the code is a bit harder to write,
 but it is safe and never segfaults. The idea is to minimize the amount
 of runtime errors of all sorts. That's also how other features of
 statically typed languages work.

I certainly agree that catching errors at compile time is preferable by far. Where I disagree is the notion that non-nullable types achieve this. I've argued extensively here that they hide errors, not fix them. Also, by "safe" I presume you mean "memory safe" which means free of memory corruption. Null pointer exceptions are memory safe. A null pointer could be caused by memory corruption, but it cannot *cause* memory corruption.

void main() { int* a; a[20000] = 2; } [C:\Users\Lionello] dmd -run t.d [C:\Users\Lionello] This code passes on Vista. Granted, needs a big enough offset and some luck, but indexing null will never be secure in the current flat memory models. L.

That is a strong argument. If an object is big enough, modifying it via a null reference may still cause memory corruption. Initializing references to null does not guarantee memory safety.

How is that corruption? These pointers were purposely set to 0x00000002, corruption I believe is when memory is modified without the programmer being aware of it. For example if the GC was to free memory that is still reachable, that would cause corruption. Corruption is near impossible to trace back, this case is trivial.

Uh? What pointer is being set to 0x00000002? I'm indexing an array that happens to be uninitialized, which means: null. The code passes without problems, but modifies a 'random' address, with unpredictable consequences. According to Walter a compile time check is not needed, because at run-time it is guaranteed that the program will abort when a null pointer is about to be used. But, that's not always the case, see my example. L.
Sep 28 2009
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 The problem with non-nullable references is what do they default to? 
 Some "nan" object? When you use a "nan" object, what should it do? Throw 
 an exception?

This is the mistake. There would no way to default initialize a non-null object. I'm surprised you are still saying this, because we discussed how NonNull!T could be implemented by disabling its default constructor. Andrei
Sep 26 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 The problem with non-nullable references is what do they default to? 
 Some "nan" object? When you use a "nan" object, what should it do? 
 Throw an exception?

This is the mistake. There would no way to default initialize a non-null object. I'm surprised you are still saying this, because we discussed how NonNull!T could be implemented by disabling its default constructor.

Sure, so the user just provides "0" as the argument to the non-default constructor. Or he writes: C c = c_empty; using c_empty as his placeholder for an empty object. Now, what happens with: c.foo(); ? Should c_empty throw an exception? To take this a little farther, suppose I wish to create an array of C that I will partially fill with valid data, and leave some empty slots. Those empty slots I stuff with c_empty, to avoid having nulls. What is c_empty's proper behavior if I mistakenly try to access its members? Forcing the user to provide an initializer does not solve the problem. The crucial point is the problem is *not* the seg fault, the seg fault is the symptom. The problem is the user has not set the object to a value that his program's logic requires. I am also perfectly happy with NonNull being a type constructor, to be used where appropriate. My disagreement is with the notion that null references should be eliminated at the language level.
Sep 26 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 The problem with non-nullable references is what do they default to? 
 Some "nan" object? When you use a "nan" object, what should it do? 
 Throw an exception?

This is the mistake. There would no way to default initialize a non-null object. I'm surprised you are still saying this, because we discussed how NonNull!T could be implemented by disabling its default constructor.

Sure, so the user just provides "0" as the argument to the non-default constructor. Or he writes: C c = c_empty; using c_empty as his placeholder for an empty object. Now, what happens with: c.foo(); ? Should c_empty throw an exception?

The problem is you keep on insisting on one case "I have a non-null reference that I don't have an initializer for, but the compiler forces me to find one, so I'll just throw a crappy value in." This focus on one situation comes straight with your admitted bad habit of defining variables in one place and initializing in another. The situation you need to open a curious eye on is "I have a reference that's never supposed to be null, but I forgot about initializing it and the compiler silently put a useless null in it." The simplest case is what _every_ D beginner has done: T x; x.fun(); to witness a crash. Why the hell does that crash? It did work when T was a struct. (Also this damns generic code to hell.) So again: focus on the situation when people forget to initialize references that are never supposed to be null. That has happened to me, and I'm supposed to know about this stuff. And one thing you don't understand is that on Linux, access violations are much more difficult to figure than others. On a computing cluster it gets one order of magnitude more difficult. So spare me of your Windows setup that launches your debugger on the line of the crash. For better or worse, many don't have that. People sometimes have problems that you don't have, and you need to put yourself in their shoes.
 To take this a little farther, 
 suppose I wish to create an array of C that I will partially fill with 
 valid data, and leave some empty slots. Those empty slots I stuff with 
 c_empty, to avoid having nulls. What is c_empty's proper behavior if I 
 mistakenly try to access its members?

You make an array of nullable references. Again you confuse having non-null as a default with having non-null as the only option.
 Forcing the user to provide an initializer does not solve the problem. 
 The crucial point is the problem is *not* the seg fault, the seg fault 
 is the symptom. The problem is the user has not set the object to a 
 value that his program's logic requires.
 
 
 I am also perfectly happy with NonNull being a type constructor, to be 
 used where appropriate. My disagreement is with the notion that null 
 references should be eliminated at the language level.

Null references shouldn't be eliminated from the language. They just should NOT be the default. I guess I'm going to say that until you tune on my station. Andrei
Sep 26 2009
next sibling parent Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Andrei Alexandrescu wrote:
 [snip]
 The problem is you keep on insisting on one case "I have a non-null 
 reference that I don't have an initializer for, but the compiler forces 
 me to find one, so I'll just throw a crappy value in." This focus on one 
 situation comes straight with your admitted bad habit of defining 
 variables in one place and initializing in another. The situation you 
 need to open a curious eye on is "I have a reference that's never 
 supposed to be null, but I forgot about initializing it and the compiler 
 silently put a useless null in it." The simplest case is what _every_ D 
 beginner has done:
 
 T x;
 x.fun();
 
 to witness a crash. Why the hell does that crash? It did work when T was 
 a struct. (Also this damns generic code to hell.)
 
 So again: focus on the situation when people forget to initialize 
 references that are never supposed to be null.
 
 That has happened to me, and I'm supposed to know about this stuff. And 
 one thing you don't understand is that on Linux, access violations are 
 much more difficult to figure than others. On a computing cluster it 
 gets one order of magnitude more difficult. So spare me of your Windows 
 setup that launches your debugger on the line of the crash. For better 
 or worse, many don't have that. People sometimes have problems that you 
 don't have, and you need to put yourself in their shoes.

Quoted for truth. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 26 2009
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 The problem is you keep on insisting on one case "I have a non-null 
 reference that I don't have an initializer for, but the compiler forces 
 me to find one, so I'll just throw a crappy value in." This focus on one 
 situation comes straight with your admitted bad habit of defining 
 variables in one place and initializing in another.

Thank you Andrei for your good efforts in trying to add some light on this topic. I think we are converging :-) But I think you have to deal with the example shown by Jeremie Pelletier too, this was my answer: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=96834 (What I have written in the last line is confused. I meant that the type system doesn't allow you to read or access an object before it's initialized. This looks like flow analysis, but there are ways to simplify/constraint the situation enough, for example with that enforce scope block). Bye, bearophile
Sep 26 2009
prev sibling parent BCS <none anon.com> writes:
Hello Walter,

 The problem with non-nullable references is what do they default to?
 Some "nan" object? When you use a "nan" object, what should it do?
 Throw an exception?
 

They don't have a default. There semantics would be such that the compiler rejects as illegal any code that would require it to supply a default. As to the user stuffing "c_empty" in just to get the compiler to shut up; firstly, that says the variable should not yet be declared as you don't yet known what value to give it and secondly either c_empy is a rational value or the user is subverting the type system and is on there own.
Sep 26 2009
prev sibling next sibling parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Jarrett Billingsley wrote:
 On Sat, Sep 26, 2009 at 5:29 PM, Jeremie Pelletier <jeremiep gmail.com> wrote:
 
 I actually side with Walter here. I much prefer my programs to crash on
 using a null reference and fix the issue than add runtime overhead that does
 the same thing. In most cases a simple backtrace is enough to pinpoint the
 location of the bug.

There is NO RUNTIME OVERHEAD in implementing nonnull reference types. None. It's handled entirely by the type system. Can we please move past this?
 Null references are useful to implement optional arguments without any
 overhead by an Optional!T wrapper. If you disallow null references what
 would "Object foo;" initialize to then?

It wouldn't. The compiler wouldn't allow it. It would force you to initialize it. That is the entire point of nonnull references.

How would you do this then? void foo(int a) { Object foo; if(a == 1) foo = new Object1; else if(a == 2) foo = Object2; else foo = Object3; foo.doSomething(); } The compiler would just die on the first line of the method where foo is null. What about "int a;" should this throw an error too? Or "float f;". What about standard pointers? I can think of so many algorithms who rely on pointers possibly being null. Maybe this could be a case to add in SafeD but leave out in standard D. I wouldn't want a nonnull reference type, I use nullables just too often.
Sep 26 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Jarrett Billingsley:

 Jeremie Pelletier:
 How would you do this then?

 void foo(int a) {
        Object foo;
        if(a == 1) foo = new Object1;
        else if(a == 2) foo = Object2;
        else foo = Object3;
        foo.doSomething();
 }

 The compiler would just die on the first line of the method where foo is
 null.

Either use Object? (a nullable reference), or factor out the object creation - use a separate method or something.

Using a separate function to initialize an nonnull reference is a possible solution, but we can invent nicer solutions too. You can have a function where inside an object is nullable but returns a nonnull reference, see the enforce() used by Denis Koroskin. (The compiler also has to recognize as a possible "enforce" an if (foo is null) {...}). Another possible solution is to use something like a Python "with" block that assures something is done when the block is done: enforce (Object foo) { // foo is nonnull, but inside here it's in a limbo if(a == 1) foo = new Object1; else if(a == 2) foo = Object2; else foo = Object3; } // when the enforce block ends foo must be initialized foo.doSomething(); Probably there are other possible solutions. A better solution is to just allow foo to be undefined until it's written over. To simplify analysis it has to be defined when the scope ends. Bye, bearophile
Sep 26 2009
prev sibling parent reply Yigal Chripun <yigal100 gmail.com> writes:
On 27/09/2009 00:59, Jeremie Pelletier wrote:
 Jarrett Billingsley wrote:
 On Sat, Sep 26, 2009 at 5:29 PM, Jeremie Pelletier
 <jeremiep gmail.com> wrote:

 I actually side with Walter here. I much prefer my programs to crash on
 using a null reference and fix the issue than add runtime overhead
 that does
 the same thing. In most cases a simple backtrace is enough to
 pinpoint the
 location of the bug.

There is NO RUNTIME OVERHEAD in implementing nonnull reference types. None. It's handled entirely by the type system. Can we please move past this?
 Null references are useful to implement optional arguments without any
 overhead by an Optional!T wrapper. If you disallow null references what
 would "Object foo;" initialize to then?

It wouldn't. The compiler wouldn't allow it. It would force you to initialize it. That is the entire point of nonnull references.

How would you do this then? void foo(int a) { Object foo; if(a == 1) foo = new Object1; else if(a == 2) foo = Object2; else foo = Object3; foo.doSomething(); } The compiler would just die on the first line of the method where foo is null. What about "int a;" should this throw an error too? Or "float f;". What about standard pointers? I can think of so many algorithms who rely on pointers possibly being null. Maybe this could be a case to add in SafeD but leave out in standard D. I wouldn't want a nonnull reference type, I use nullables just too often.

with current D syntax this can be implemented as: void foo(int a) { Object foo = (a == 1) ? new Object1 : (a == 2) ? Object2 : Object3; foo.doSomething(); } The above agrees also with what Denis said about possible uninitialized variable bugs. in D "if" is the same as in C - a procedural statement. I personally think that it should be an expression like in FP languages which is safer. to reinforce what others have said already: 1) non-null references *by default* does not affect nullable references in any way and does not add any overhead. The idea is to make the *default* the *safer* option which is one of the primary goals of this language. 2) there is no default value for non-nullable references. you must initialize it to a correct, logical value *always*. If you resort to some "default" value you are doing something wrong. btw, C++ references implement this idea already. functions that return a reference will throw an exception on error (Walter's canary) while the same function that returns a pointer will usually just return null on error. segfaults are *NOT* a good mechanism to handle errors. An exception trace gives you a whole lot more information about what went wrong and where compared to a segfault.
Sep 26 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Yigal Chripun wrote:
 On 27/09/2009 00:59, Jeremie Pelletier wrote:
 Jarrett Billingsley wrote:
 On Sat, Sep 26, 2009 at 5:29 PM, Jeremie Pelletier
 <jeremiep gmail.com> wrote:

 I actually side with Walter here. I much prefer my programs to crash on
 using a null reference and fix the issue than add runtime overhead
 that does
 the same thing. In most cases a simple backtrace is enough to
 pinpoint the
 location of the bug.

There is NO RUNTIME OVERHEAD in implementing nonnull reference types. None. It's handled entirely by the type system. Can we please move past this?
 Null references are useful to implement optional arguments without any
 overhead by an Optional!T wrapper. If you disallow null references what
 would "Object foo;" initialize to then?

It wouldn't. The compiler wouldn't allow it. It would force you to initialize it. That is the entire point of nonnull references.

How would you do this then? void foo(int a) { Object foo; if(a == 1) foo = new Object1; else if(a == 2) foo = Object2; else foo = Object3; foo.doSomething(); } The compiler would just die on the first line of the method where foo is null. What about "int a;" should this throw an error too? Or "float f;". What about standard pointers? I can think of so many algorithms who rely on pointers possibly being null. Maybe this could be a case to add in SafeD but leave out in standard D. I wouldn't want a nonnull reference type, I use nullables just too often.

with current D syntax this can be implemented as: void foo(int a) { Object foo = (a == 1) ? new Object1 : (a == 2) ? Object2 : Object3; foo.doSomething(); } The above agrees also with what Denis said about possible uninitialized variable bugs. in D "if" is the same as in C - a procedural statement. I personally think that it should be an expression like in FP languages which is safer. to reinforce what others have said already: 1) non-null references *by default* does not affect nullable references in any way and does not add any overhead. The idea is to make the *default* the *safer* option which is one of the primary goals of this language. 2) there is no default value for non-nullable references. you must initialize it to a correct, logical value *always*. If you resort to some "default" value you are doing something wrong. btw, C++ references implement this idea already. functions that return a reference will throw an exception on error (Walter's canary) while the same function that returns a pointer will usually just return null on error. segfaults are *NOT* a good mechanism to handle errors. An exception trace gives you a whole lot more information about what went wrong and where compared to a segfault.

This is something for the runtime or the debugger to deal with. My runtime converts access violations on windows or segfaults on linux into exception objects, which unwind all the way down to main where it catches into the unhandled exception handler (or crash handler) and I get a neat popup with a "hello, your program crashed at this point, here is a backtrace with resolved symbols and filenames along with current registers and loaded modules, would you like a cup of coffee while you solve the problem?". I sent that crash handler to D.announce last week too. The compiler won't be able to enforce *every* nonnull reference and segfaults are bound to happen, especially with casting. While it may prevent most of them, any good programmer would too, I don't remember the last time I had a segfault on a null reference actually. I can see what the point is with nonnull references, but I can also see its not a bulletproof solution. ie "Object foo = cast(Object)null;" would easily bypass the nonnull enforcement, resulting in a segfault the system is trying to avoid. What about function parameters, a lot of parameters are optional references, which are tested and then used into functions whose parameters aren't optional. It would result in a lot of casts, something that could easily confuse people and easily generate segfaults. Alls I'm saying is, nonnull references would just take the issue from one place to another. Like Walter said, you can put a gas mask to ignore the room full of toxic gas, but that doesn't solve the gas problem in itself, you're just denyinng it exists. Then someday you forget about it, remove the mask, and suffocate. Jeremie
Sep 26 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Jeremie Pelletier:

 I don't remember 
 the last time I had a segfault on a null reference actually.

I have read that null deference bugs are among the most common problem in Java/C# code. I have no pointers...
 I can see what the point is with nonnull references, but I can also see 
 its not a bulletproof solution. ie "Object foo = cast(Object)null;" 
 would easily bypass the nonnull enforcement, resulting in a segfault the 
 system is trying to avoid.

That's life.
 What about function parameters, a lot of parameters are optional 
 references, which are tested and then used into functions whose 
 parameters aren't optional. It would result in a lot of casts, something 
 that could easily confuse people and easily generate segfaults.

By "optional" I think you mean "nullable" there. Note that some of those casts can be avoided, because the nonnull nature of a reference can be implicitly inferred by the compiler: Foo somefunction(Foo? foo) { if (foo is null) { ... // do something } else { // here foo can be implicitly converted to // a nonnullable reference, because the compiler // can infer that here foo can never be null. return foo; }
 Alls I'm saying is, nonnull references would just take the issue from 
 one place to another. Like Walter said, you can put a gas mask to ignore 
 the room full of toxic gas, but that doesn't solve the gas problem in 
 itself, you're just denyinng it exists. Then someday you forget about 
 it, remove the mask, and suffocate.

No solution is perfect, so it's a matter of computing its pro and cons. It's hard to tell how much good a feature is before trying it. That's why I have half-seriously to implement nonullables in a branch of D2, test it and keep it only if it turns out to be good. Bye, bearophile
Sep 26 2009
prev sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Jeremie Pelletier wrote:
 ...
 
 This is something for the runtime or the debugger to deal with. My
 runtime converts access violations on windows or segfaults on linux into
 exception objects, which unwind all the way down to main where it
 catches into the unhandled exception handler (or crash handler) and I
 get a neat popup with a "hello, your program crashed at this point, here
 is a backtrace with resolved symbols and filenames along with current
 registers and loaded modules, would you like a cup of coffee while you
 solve the problem?". I sent that crash handler to D.announce last week too.

See my long explanation that NPEs are only symptoms; very rarely do they put up a big sign saying "what ho; the problem is RIGHT HERE!"
 The compiler won't be able to enforce *every* nonnull reference and
 segfaults are bound to happen, especially with casting. While it may
 prevent most of them, any good programmer would too, I don't remember
 the last time I had a segfault on a null reference actually.

I do. It took a day and a half to track it back to the source.
 I can see what the point is with nonnull references, but I can also see
 its not a bulletproof solution. ie "Object foo = cast(Object)null;"
 would easily bypass the nonnull enforcement, resulting in a segfault the
 system is trying to avoid.

Why lock the door when someone could break the window? Why have laws when people could break them? Why build a wall when someone could park a hydrogen bomb next to it? Why have a typesystem when you could use casting to put the float representation of 3.14159 into a void* and then dereference it? Casting is not an argument against non-null references because casting can BREAK ANYTHING. "Doctor, it hurts when I hammer nails into my shin." "So stop doing it."
 What about function parameters, a lot of parameters are optional
 references, which are tested and then used into functions whose
 parameters aren't optional. It would result in a lot of casts, something
 that could easily confuse people and easily generate segfaults.

So what you're saying is: better to never, ever do error checking and just start fixing things after they've broken? And why is everything solved via casting? Look: here's a solution that's less typing than a cast, AND it's safe. You could even put nonnull it in object.d! T notnull(U : T?, T)(U obj) { if( obj is null ) throw new NullException; return cast(T) obj; } void foo(Quxx o) { o.doStuff; } void foo(Quxx? o) { foo(notnull(o)); }
 Alls I'm saying is, nonnull references would just take the issue from
 one place to another.

YES. THAT'S THE POINT. It would take the error from a likely unrelated location in the program's execution and put it RIGHT where the mistake initially occurs!
 Like Walter said, you can put a gas mask to ignore
 the room full of toxic gas, but that doesn't solve the gas problem in
 itself, you're just denyinng it exists. Then someday you forget about
 it, remove the mask, and suffocate.
 
 Jeremie

That's what NPEs are! They're a *symptom* of you passing crap in to fields or functions. They very, VERY rarely actually point out what the underlying mistake is.
Sep 26 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Daniel Keep wrote:
 
 Jeremie Pelletier wrote:
 ...

 This is something for the runtime or the debugger to deal with. My
 runtime converts access violations on windows or segfaults on linux into
 exception objects, which unwind all the way down to main where it
 catches into the unhandled exception handler (or crash handler) and I
 get a neat popup with a "hello, your program crashed at this point, here
 is a backtrace with resolved symbols and filenames along with current
 registers and loaded modules, would you like a cup of coffee while you
 solve the problem?". I sent that crash handler to D.announce last week too.

See my long explanation that NPEs are only symptoms; very rarely do they put up a big sign saying "what ho; the problem is RIGHT HERE!"
 The compiler won't be able to enforce *every* nonnull reference and
 segfaults are bound to happen, especially with casting. While it may
 prevent most of them, any good programmer would too, I don't remember
 the last time I had a segfault on a null reference actually.

I do. It took a day and a half to track it back to the source.

Happens to me on some issues too, I don't ask for a workaround in the compiler, I just learn my lesson and never repeat that error.
 I can see what the point is with nonnull references, but I can also see
 its not a bulletproof solution. ie "Object foo = cast(Object)null;"
 would easily bypass the nonnull enforcement, resulting in a segfault the
 system is trying to avoid.

Why lock the door when someone could break the window?

Easier to prove someone broke in when the window is shattered than if someone just went through the door, stole your stuff and left without any traces.
 Why have laws when people could break them?

People break the law, some of them only for the challenge of it, some of them to survive, some just don't care. Remove the laws and you remove most of these behaviors you're trying to prohibit in the first place. Most of the time laws are there so corporate criminals can get rid of street criminals legally.
 Why build a wall when someone could park a hydrogen bomb next to it?

They keep most people out, or in. Hydrogen bombs are not something you expect the first guy on the street to own.
 Why have a typesystem when you could use casting to put the float
 representation of 3.14159 into a void* and then dereference it?

Because it also allows for countless different optimizations, at the price of also being able to shoot your own foot. There, four similar questions and four completely different answers. My point is, there is no perfect all-around solution.
 Casting is not an argument against non-null references because casting
 can BREAK ANYTHING.
 
 "Doctor, it hurts when I hammer nails into my shin."
 
 "So stop doing it."

Why tell him to stop it? The guy will just kill himself at some point and raise the collective IQ of mankind in the process. Same for programming or anything else, if someone is dumb enough to repeat the same mistake over and over, he should find a new domain to work in.
 What about function parameters, a lot of parameters are optional
 references, which are tested and then used into functions whose
 parameters aren't optional. It would result in a lot of casts, something
 that could easily confuse people and easily generate segfaults.

So what you're saying is: better to never, ever do error checking and just start fixing things after they've broken?

No, but you shouldn't rule out the fact that they may break, no matter what system you're working with.
 And why is everything solved via casting?  Look: here's a solution
 that's less typing than a cast, AND it's safe.  You could even put
 nonnull it in object.d!
 
 T notnull(U : T?, T)(U obj)
 {
     if( obj is null ) throw new NullException;
     return cast(T) obj;
 }
 
 void foo(Quxx o)
 {
     o.doStuff;
 }
 
 void foo(Quxx? o)
 {
     foo(notnull(o));
 }

Also slower than a cast if the compiler doesn't use -inline. Debug builds are already painful enough as it is with realtime code.
 Alls I'm saying is, nonnull references would just take the issue from
 one place to another.

YES. THAT'S THE POINT. It would take the error from a likely unrelated location in the program's execution and put it RIGHT where the mistake initially occurs!

That's a case for variable initialization, not nullable/non-null types. A nonnull type does not guarantee the value will *never* be null, even the simplest hack can get around it.
 Like Walter said, you can put a gas mask to ignore
 the room full of toxic gas, but that doesn't solve the gas problem in
 itself, you're just denyinng it exists. Then someday you forget about
 it, remove the mask, and suffocate.

 Jeremie

That's what NPEs are! They're a *symptom* of you passing crap in to fields or functions. They very, VERY rarely actually point out what the underlying mistake is.

There again, I favor stronger initialization semantics over nonnull types. This will get rid of most of these errors and still keep you on your toes when a segfault arise, if you only see a segfault once a year how will you know how to handle it :) Most segfaults I have take me at most a few minutes to pinpoint. Its finding backdoors to compiler enforcements thats annoying.
Sep 26 2009
parent Christopher Wright <dhasenan gmail.com> writes:
Jeremie Pelletier wrote:
 There again, I favor stronger initialization semantics over nonnull 
 types. This will get rid of most of these errors

Only for local variables. Not for fields.
 Most segfaults I have take me at most a few minutes to pinpoint. Its 
 finding backdoors to compiler enforcements thats annoying.

You're complaining now because you'd try to cram 'null' down the throat of something marked 'not-null' and fear it would be difficult?
Sep 27 2009
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Jarrett Billingsley wrote:
 It wouldn't. The compiler wouldn't allow it. It would force you to
 initialize it. That is the entire point of nonnull references.

Initialize it to what? A user-defined default object? What should happen if that default object is accessed? Throw an exception? <g> How would you define an "empty" slot in a data structure?
Sep 26 2009
parent reply grauzone <none example.net> writes:
Walter Bright wrote:
 Jarrett Billingsley wrote:
 It wouldn't. The compiler wouldn't allow it. It would force you to
 initialize it. That is the entire point of nonnull references.

Initialize it to what? A user-defined default object? What should happen if that default object is accessed? Throw an exception? <g> How would you define an "empty" slot in a data structure?

You can allow a non-nullable reference to be null, just like you allow an immutable object to be mutable during construction. You just have to make sure the non-nullable reference is definitely assigned.
Sep 26 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
grauzone wrote:
 You just have to make sure the non-nullable reference is definitely 
 assigned.

See my reply to Denis Koroskin on that.
Sep 26 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Denis Koroskin wrote:
 If you disallow null references what would "Object foo;" initialize to 
 then?


Should: int a; be disallowed, too? If not (and explain why it should behave differently), what about: T a; in generic code?
Sep 26 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Denis Koroskin wrote:
 One more:
 
 T foo(bool someCondition)
 {
     T? t;
     if (someCondition) t = someInitializer();
     // ...
 
     if (t.isNull) { // not initialized yet
         // ...
     }
 
     return enforce(t); // throws if t is not initialized yet, because 
 foo *must* return a valid value by a contract
 }

It seems to me you've got null references there anyway? What would you do about: T[] a; a[i] = foo(); where you want to have unused slots be null (or empty, or nothing)?
Sep 26 2009
parent reply downs <default_357-line yahoo.de> writes:
Denis Koroskin wrote:
 On Sun, 27 Sep 2009 03:01:48 +0400, Walter Bright
 <newshound1 digitalmars.com> wrote:
 
 Denis Koroskin wrote:
 One more:
  T foo(bool someCondition)
 {
     T? t;
     if (someCondition) t = someInitializer();
     // ...
      if (t.isNull) { // not initialized yet
         // ...
     }
      return enforce(t); // throws if t is not initialized yet,
 because foo *must* return a valid value by a contract
 }

It seems to me you've got null references there anyway? What would you do about: T[] a; a[i] = foo(); where you want to have unused slots be null (or empty, or nothing)?

Easy: T? foo(); // returns valid object or a null T?[] a; a[i] = foo();

The case of a non-null array is, I think, worthy of some more consideration. These are the things that would not be possible with a non-nullable array: - newing it - setting .length to a greater value - appending a nullable array of the same base type. Basically, anything that may fill it with nulls. The only two allowed instructions would be ~= NonNullable and ~= NonNullableArray. And it's good that way.
Sep 27 2009
parent bearophile <bearophileHUGS lycos.com> writes:
downs:

 Basically, anything that may fill it with nulls.
 
 The only two allowed instructions would be ~= NonNullable and ~=
NonNullableArray. And it's good that way.

I agree. In such situation I'd also like to have a default method to insert one or more nonnull items in any point of the array (see insert method of Python lists, that can also be expressed as s[i:i]=[x]). Having fee basic default methods will help keep such safe arrays flexible. Bye, bearophile
Sep 27 2009
prev sibling next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 My assessment: the chances of convincing Walter he's wrong are quite 
 slim... Having a rationale for being wrong is very hard to overcome.

Especially when I'm right!
Sep 26 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sat, Sep 26, 2009 at 5:29 PM, Jeremie Pelletier <jeremiep gmail.com> wrote:

 I actually side with Walter here. I much prefer my programs to crash on
 using a null reference and fix the issue than add runtime overhead that does
 the same thing. In most cases a simple backtrace is enough to pinpoint the
 location of the bug.

There is NO RUNTIME OVERHEAD in implementing nonnull reference types. None. It's handled entirely by the type system. Can we please move past this?
 Null references are useful to implement optional arguments without any
 overhead by an Optional!T wrapper. If you disallow null references what
 would "Object foo;" initialize to then?

It wouldn't. The compiler wouldn't allow it. It would force you to initialize it. That is the entire point of nonnull references.
Sep 26 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sun, 27 Sep 2009 01:29:55 +0400, Jeremie Pelletier <jeremiep gmail.com>  
wrote:
 [...] I much prefer my programs to crash on using a null reference and  
 fix the issue than add runtime overhead that does the same thing.

What runtime overhead are you talking about here? Use of non-null pointers actually make your program run faster, because you don't have to check them against null all the time. Non-null references is a contract, which is enforced by a compiler at compile-time, not runtime. It also makes your program more consistent and less verbose.
 Null references are useful to implement optional arguments without any  
 overhead by an Optional!T wrapper.

Once again, what overhead are you talking about? Optional!(T) (or Nullable!(T)) doesn't have to have any additional bits to store the NULL state for reference types.
 If you disallow null references what would "Object foo;" initialize to  
 then?

Nothing. It's a compile-time error. But the following is not: Object foo = initializer(); Nullable!(Object) foo2; // default-initialized to a null, same as currently Object? foo3; // a desirable syntax sugar for Nullable!(Object)
Sep 26 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sun, 27 Sep 2009 01:59:45 +0400, Jeremie Pelletier <jeremiep gmail.com>  
wrote:

 Jarrett Billingsley wrote:
 On Sat, Sep 26, 2009 at 5:29 PM, Jeremie Pelletier <jeremiep gmail.com>  
 wrote:

 I actually side with Walter here. I much prefer my programs to crash on
 using a null reference and fix the issue than add runtime overhead  
 that does
 the same thing. In most cases a simple backtrace is enough to pinpoint  
 the
 location of the bug.

None. It's handled entirely by the type system. Can we please move past this?
 Null references are useful to implement optional arguments without any
 overhead by an Optional!T wrapper. If you disallow null references what
 would "Object foo;" initialize to then?

initialize it. That is the entire point of nonnull references.

How would you do this then? void foo(int a) { Object foo; if(a == 1) foo = new Object1; else if(a == 2) foo = Object2; else foo = Object3; foo.doSomething(); }

Let's consider the following example, first: void foo(int a) { Object foo; if (a == 1) foo = Object1; else if(a == 2) foo = Object2; else if(a == 3) foo = Object3; foo.doSomething(); } Do you agree that this program has a bug? It is buggy, because one of the paths skips "foo" variable initialization. Now back to your question. My answer is that compiler should be smart enough to differentiate between the two cases and raise a compile-time error in a latter one. That's what C# compiler does: the first case successfully compiles while the second one doesn't. Until then, non-nullable references are too hard to use to become useful, because you'll end up with a lot of initializer functions: void foo(int a) { Object initializeFoo() { if (a == 1) return new Object1(); if (a == 2) return new Object2(); return new Object3(); } Object foo = initializeFoo(); foo.doSomething(); } I actually believe the code is more clear that way, but there are cases when you can't do it (initialize a few variables, for example)
Sep 26 2009
prev sibling next sibling parent language_fan <foo bar.com.invalid> writes:
Sat, 26 Sep 2009 14:49:45 -0700, Walter Bright thusly wrote:

 The problem with non-nullable references is what do they default to?
 Some "nan" object? When you use a "nan" object, what should it do?
 Throw an exception?

Well typically if your type system supports algebraic types, you can define a higher order Optional type as follows: type Optional T = Some T | Nothing Now a safe nullable reference type would look like Optional (T*) The whole point is to make null pointer tests explicit. You can pass around the optional type freely, and only on the actual use site you need to pattern match it to see if it's a null pointer: void foo(SafeRef[int] a) { match(a) { case Nothing => // handle null pointer case Some(b) => return b + 2; } } The default initialization of this type is Nothing. Some data structures can be initialized in a way that null pointers don't exist. In these cases you can use a type that does not have the 'Nothing' form. This can lead to nice optimizations. There is no default value, cause default initialization can never occur.
Sep 26 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sun, 27 Sep 2009 01:49:45 +0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 The problem with non-nullable references is what do they default to?  
 Some "nan" object? When you use a "nan" object, what should it do? Throw  
 an exception?

Oh, my! You don't even know what a non-null default is! There is a Null Object pattern (http://en.wikipedia.org/wiki/Null_Object_pattern) - I guess that's what you are talking about, when you mean "nan object" - but it has little to do with non-null references. With non-null references, you don't have "wrong values", that throw an exception upon use (although it's clearly possible), you get a correct value. If an object may or may not have a valid value, you mark it as nullable. All the difference is that it's a non-default behavior, that's it. And a user is now warned, that an object may be not initialized.
Sep 26 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sat, Sep 26, 2009 at 5:59 PM, Jeremie Pelletier <jeremiep gmail.com> wro=
te:
 How would you do this then?

 void foo(int a) {
 =A0 =A0 =A0 =A0Object foo;
 =A0 =A0 =A0 =A0if(a =3D=3D 1) foo =3D new Object1;
 =A0 =A0 =A0 =A0else if(a =3D=3D 2) foo =3D Object2;
 =A0 =A0 =A0 =A0else foo =3D Object3;
 =A0 =A0 =A0 =A0foo.doSomething();
 }

 The compiler would just die on the first line of the method where foo is
 null.

Either use Object? (a nullable reference), or factor out the object creation - use a separate method or something.
 What about "int a;" should this throw an error too? Or "float f;".

Those are not reference types. But actually, the D spec says it's an error to use an uninitialized variable, so a compliant D compiler wouldn't be out of line by diagnosing such things as errors if they are used before they're intialized. Such a compiler would break a lot of existing D code, but that's what you get for not following the spec..
 What about standard pointers? I can think of so many algorithms who rely =

 pointers possibly being null.

Again, you have both nonnull (void*) and nullable (void*?) types.
 Maybe this could be a case to add in SafeD but leave out in standard D. I
 wouldn't want a nonnull reference type, I use nullables just too often.

You probably use them far less than you'd think.
Sep 26 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sat, Sep 26, 2009 at 6:10 PM, Walter Bright
<newshound1 digitalmars.com> wrote:
 Jarrett Billingsley wrote:
 It wouldn't. The compiler wouldn't allow it. It would force you to
 initialize it. That is the entire point of nonnull references.

Initialize it to what? A user-defined default object? What should happen if that default object is accessed? Throw an exception? <g>

The point of using a nonnull type is that you *never expect it to be null ever*. So you would be initializing it to some useful object. If you *want* null, you'd use a nullable reference.
 How would you define an "empty" slot in a data structure?

A nullable reference.
Sep 26 2009
prev sibling next sibling parent language_fan <foo bar.com.invalid> writes:
Sat, 26 Sep 2009 17:59:45 -0400, Jeremie Pelletier thusly wrote:

 How would you do this then?
 
 void foo(int a) {
 	Object foo;
 	if(a == 1) foo = new Object1;
 	else if(a == 2) foo = Object2;
 	else foo = Object3;
 	foo.doSomething();
 }

I just LOVE to see questions like these ;) You still have SO much to learn. Go grab the 'purely functional data structures' by chris okasaki from the nearest library and try how many pages you can read before your head explodes. No, it is a purely enlightening process actually :)
Sep 26 2009
prev sibling next sibling parent language_fan <foo bar.com.invalid> writes:
Sun, 27 Sep 2009 02:15:33 +0400, Denis Koroskin thusly wrote:

 Until the, non-nullable references are too hard to use to become
 useful, because you'll end up with a lot of initializer functions:
 
 void foo(int a) {
 	Object initializeFoo() {
 		if (a == 1) return new Object1();
 		if (a == 2) return new Object2();
 		return new Object3();
          }
 
 	Object foo = initializeFoo();
 	foo.doSomething();
 }
 
 I actually believe the code is more clear that way, but there are cases
 when you can't do it (initialize a few variables, for example)

Having a functional switch() helps a lot. I write code like this every day: val foo = predicate.match { case 1 => new Object1 case 2 => new Object2("foo", "bar") case _ => new DefaultObject } foo.doSomething I also rarely have runtime bugs these days.
Sep 26 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sun, 27 Sep 2009 02:18:15 +0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Denis Koroskin wrote:
 If you disallow null references what would "Object foo;" initialize to  
 then?


Should: int a; be disallowed, too? If not (and explain why it should behave differently), what about: T a; in generic code?

Functional languages don't distinguish between the two (reference or not). We were discussing "non-null by default"-references because it's far less radical change to a language that "non-null by default" for all types. Once again, you are taking code out of the context. It is worthless to discuss "int a;" on its own. I'll try to but the context back and show a few concrete examples (where T is a generic type): void foo() { T t; } Results in: error (Unused variable 't'). T foo(bool someCondition) { T t; if (someCondition) t = someInitializer(); return t; } Results in: error (Use of potentially unassigned variable 't') T foo(bool someCondition) { T t; if (someCondition) t = someInitializer(); else t = someOtherInitializer(); return t; } Results in: successful compilation
Sep 26 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sun, 27 Sep 2009 02:43:05 +0400, Denis Koroskin <2korden gmail.com>  
wrote:

 On Sun, 27 Sep 2009 02:18:15 +0400, Walter Bright  
 <newshound1 digitalmars.com> wrote:

 Denis Koroskin wrote:
 If you disallow null references what would "Object foo;" initialize  
 to then?


Should: int a; be disallowed, too? If not (and explain why it should behave differently), what about: T a; in generic code?

Functional languages don't distinguish between the two (reference or not). We were discussing "non-null by default"-references because it's far less radical change to a language that "non-null by default" for all types. Once again, you are taking code out of the context. It is worthless to discuss "int a;" on its own. I'll try to but the context back and show a few concrete examples (where T is a generic type): void foo() { T t; } Results in: error (Unused variable 't'). T foo(bool someCondition) { T t; if (someCondition) t = someInitializer(); return t; } Results in: error (Use of potentially unassigned variable 't') T foo(bool someCondition) { T t; if (someCondition) t = someInitializer(); else t = someOtherInitializer(); return t; } Results in: successful compilation

One more: T foo(bool someCondition) { T? t; if (someCondition) t = someInitializer(); // ... if (t.isNull) { // not initialized yet // ... } return enforce(t); // throws if t is not initialized yet, because foo *must* return a valid value by a contract }
Sep 26 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sun, 27 Sep 2009 02:49:06 +0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Denis Koroskin wrote:
 On Sun, 27 Sep 2009 01:49:45 +0400, Walter Bright  
 <newshound1 digitalmars.com> wrote:

 The problem with non-nullable references is what do they default to?  
 Some "nan" object? When you use a "nan" object, what should it do?  
 Throw an exception?

There is a Null Object pattern (http://en.wikipedia.org/wiki/Null_Object_pattern) - I guess that's what you are talking about, when you mean "nan object" - but it has little to do with non-null references.

It's the black hole object. It prevents you from getting a seg fault, but I see no rationale for expecting that an unexpected null object always returning "I succeeded" means your program will operate correctly. The white hole object, of course, always throws an exception when it is accessed. At least you know something went wrong - but you already have that with null.
 With non-null references, you don't have "wrong values", that throw an  
 exception upon use (although it's clearly possible), you get a correct  
 value.

You're not getting a correct value, you're getting another default value.

I'm sorry but I can not continue discussion with you like that. You are not listening! You are not even trying to understand what I say. We are talking *completely* different things here. Who the hell cares if it's a black or white as long as it is a hole object? I tell you that no one is gonna use it, because it's *much* easier to do everything right (i.e. initialize a reference to proper value) than create Hole classes for each of the class/interface. I can't even imagine anyone writing the code like this: T someFunction(Args someArgs) { ISomeInterface someInterface = new BlackHoleOfSomeInterfaceThatAlwaysThrows(); // let's initialize that variable to some dumb value just to make compiler happy // rest of the method body } Novices, maybe. But professionals would never do that sin for sure. *Please* let's go past that pattern, it really has nothing to do with proposed non-null by default references.
 If the logic of your program is expecting a prime number > 8, and the  
 null object returns 0, now what?

You are talking heresy here. I'm afraid you don't even know what you are talking about. A) There is no such thing as null object. That's bullsh*t! No one ever proposed to use those, you did. And now you deny use of them and discuss how bad they are. B) You can't call a function that accepts non-null T if you don't have a valid (i.e. non-null) T reference. End of story.
 If an object may or may not have a valid value, you mark it as  
 nullable. All the difference is that it's a non-default behavior,  
 that's it. And a user is now warned, that an object may be not  
 initialized.

He isn't warned, that's just the problem. The null object happily says "I succeeded" for all input and returns more default values and null objects.

Please stop that "null object" pattern propaganda. Did you even read what I wrote? I wrote: if a variable may be not initialized - no problem, make it nullable and assign a null! A user is now forced to check that variable against a null before dereference and must take appropriate actions if it is null.
 What happens if the output of your program then becomes a null object?  
 How are you going to go about tracing that back to its source? That's a  
 lot harder than working backwards from where a null exception originated.

 I used to work at Boeing designing critical flight systems. Absolutely  
 the WRONG failure mode is to pretend nothing went wrong and happily  
 return default values and show lovely green lights on the instrument  
 panel. The right thing is to immediately inform the pilot that something  
 went wrong and INSTANTLY SHUT THE BAD SYSTEM DOWN before it does  
 something really, really bad, because now it is in an unknown state. The  
 pilot then follows the procedure he's trained to, such as engage the  
 backup.

 A null pointer exception fits right in with that philosophy.

 You could think of null exceptions like pain - sure it's unpleasant, but  
 people who feel no pain constantly injure themselves and don't live very  
 long. When I went to the dentist as a kid for the first time, he shot my  
 cheek full of novacaine. After the dental work, I went back to school. I  
 found to my amusement that if I chewed on my cheek, it didn't hurt.

 Boy was I sorry about that later <g>.

That's trolling, Walter. I'm sorry, but you are talking non-sense here. Once again, no one ever proposed use of null object pattern. You imagined it and now denying its use.
Sep 26 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sun, 27 Sep 2009 03:01:48 +0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Denis Koroskin wrote:
 One more:
  T foo(bool someCondition)
 {
     T? t;
     if (someCondition) t = someInitializer();
     // ...
      if (t.isNull) { // not initialized yet
         // ...
     }
      return enforce(t); // throws if t is not initialized yet, because  
 foo *must* return a valid value by a contract
 }

It seems to me you've got null references there anyway? What would you do about: T[] a; a[i] = foo(); where you want to have unused slots be null (or empty, or nothing)?

Easy: T? foo(); // returns valid object or a null T?[] a; a[i] = foo();
Sep 26 2009
prev sibling next sibling parent reply language_fan <foo bar.com.invalid> writes:
Sat, 26 Sep 2009 15:49:06 -0700, Walter Bright thusly wrote:

 I used to work at Boeing designing critical flight systems. Absolutely
 the WRONG failure mode is to pretend nothing went wrong and happily
 return default values and show lovely green lights on the instrument
 panel.

Basically if there is only one way the system can operate correctly, your approach is to catch errors on runtime (segfaults) until a later iteration of the program development turns out to work correctly or well enough. Meanwhile there are several buggy revisions of the program in the development process. The idea behind non-nullable types and other contracts is to catch these errors on compile time. Sure, the code is a bit harder to write, but it is safe and never segfaults. The idea is to minimize the amount of runtime errors of all sorts. That's also how other features of statically typed languages work.
Sep 26 2009
parent Leandro Lucarella <llucax gmail.com> writes:
grauzone, el 27 de septiembre a las 22:31 me escribiste:
Woah, nice. I stand corrected. Is this in druntime already?

Not yet, its part of a custom runtime I'm working on and wish to release under a public domain license when I get the time. The code is linked from a thread in D.announce.

Some of this functionality is also in Tango (SVN version). Signals are catched only to print a backtrace.

I think this is a very bad idea. When the program receive a segfault I want my lovely core dumped. A core dump is way more useful than any possible backtrace. I really don't see any use for it except if an uncaught exception could generate a core dump (just as GCC do for C++ code). But I *really* *really* want my core dump, so I can open my debugger and inspect the dead program exactly in the point where it failed. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- The average person laughs 13 times a day
Sep 27 2009
prev sibling next sibling parent language_fan <foo bar.com.invalid> writes:
Sun, 27 Sep 2009 02:04:06 +0200, Yigal Chripun thusly wrote:

 segfaults are *NOT* a good mechanism to handle errors. An exception
 trace gives you a whole lot more information about what went wrong and
 where compared to a segfault.

Indeed, especially since in the case of D half of the userbase has a broken linker (optlink) and the other half has a broken debugger (gdb). I much rather write non-segfaulting applications in a language without debugger than buggy crap and debug it with the world's best debugger.
Sep 26 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sat, Sep 26, 2009 at 7:21 PM, Jeremie Pelletier <jeremiep gmail.com> wrote:
 That's exactly the point with nonnull references, they turn access
 violations or segfaults into undefined behavior, or worse into generic
 behavior that's much harder to track to its source.

 I think nonnull references are a nice concept for languages that have a
 higher level than D. If I expect references to never be null I just don't
 check for null before using them, and let the code crash which gives me a
 nice crash window with a backtrace in my runtime.

You're missing the point. You wouldn't have "undefined behavior at runtime" with nonnull references because there would be NO POINT in having nonnull references without ALSO having nullable references. Could your reference be null? Use a nullable reference. Is your reference never supposed to be null? Use a nonnull reference. End of problem. You do not create "null objects" and store them in a nonnull reference which you then check at runtime. You use a nullable reference which the language *forces* you to check before use.
Sep 26 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sat, Sep 26, 2009 at 10:59 PM, Jeremie Pelletier <jeremiep gmail.com> wrote:

 The compiler won't be able to enforce *every* nonnull reference and
 segfaults are bound to happen, especially with casting. While it may prevent
 most of them, any good programmer would too, I don't remember the last time
 I had a segfault on a null reference actually.

 I can see what the point is with nonnull references, but I can also see its
 not a bulletproof solution. ie "Object foo = cast(Object)null;" would easily
 bypass the nonnull enforcement, resulting in a segfault the system is trying
 to avoid.

 What about function parameters, a lot of parameters are optional references,
 which are tested and then used into functions whose parameters aren't
 optional. It would result in a lot of casts, something that could easily
 confuse people and easily generate segfaults.

You haven't read my reply to your post yet, have you. Nullable. References. Exist. Too.
Sep 26 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sat, Sep 26, 2009 at 11:06 PM, Jeremie Pelletier <jeremiep gmail.com> wrote:
 I don't want the language to force me to check nullable references before
 using them, that just takes away a lot of optimization cases.

You don't design tight loops that dereference pointers with the intention that those pointers will ever be null. Those loops always expect nonnull pointers, and therefore you'd use a nonnull reference. The number of null references in your program are far less than you'd think. For those that really could be legitimately null (like an optional callback or something), you have to check for null at runtime anyway. Most of your code wouldn't really change. You'd instead just get more errors at compile time for things that are obviously illegal or just very potentially dangerous.
 You could just use the casting system to sneak null into a nonnull reference
 and bam, undefined behavior.

No, you couldn't. That would be a pretty shitty nonnull reference type if the compiler let you put null in it.
 And you could have nullables which are always
 nonnull at some point in time within a scope but your only way out of the
 compiler errors about using a nullable without first testing it for nullity
 is to use excessive casting.

The argument of verbosity that comes up with nonnull references holds some weight, but it's far more a matter of designing your code not to do something like that.
Sep 26 2009
prev sibling next sibling parent language_fan <foo bar.com.invalid> writes:
Sun, 27 Sep 2009 00:27:14 -0700, Walter Bright thusly wrote:

 You seem to be under the impression that nothing can be made
 uncrashable without introducing the possibility of corrupted state.
 That's hogwash.


What I mean by safe is that no matter what you do, you cannot make the program crash or cause memory corruption. If you look at typical functional languages, unless FFI is used, the only ways the program may fail are a) no more stack memory b) no more heap memory c) programs halts (halting problem) d) developer explicitly kills the program e.g. with the Error type. Note that if your language is simple enough, say simply typed lambda calculus, you do not have the third problem anymore. All of these errors can also happen in D, but none of the D's other problems happen in those languages.
Sep 27 2009
prev sibling next sibling parent Jesse Phillips <jessekphillips gmail.com> writes:
On Sun, 27 Sep 2009 10:10:19 -0400, Nick Sabalausky wrote:

 "Walter Bright" <newshound1 digitalmars.com> wrote in message
 news:h9n3k5$2eu9$1 digitalmars.com...
 Jason House wrote:
 Also, by "safe" I presume you mean "memory safe" which means free of
 memory corruption. Null pointer exceptions are memory safe. A null
 pointer could be caused by memory corruption, but it cannot *cause*
 memory corruption.

I reject this argument too :( To me, code isn't safe if it crashes.

Well, we can't discuss this if we cannot agree on terms. The conventional definition of memory safe means no memory corruption.

He keeps saying "safe", and every time he does you turn it into "memory safe". If he meant "memory safe" he probably would have said something like "memory safe". He already made it perfectly clear he's talking about crashes, so continuing to put the words "memory safe" into his mouth doesn't help the discussion.

The thing is that memory safety is the only safety with code. In Walter's examples he very clearly showed that a crash is not unsafe, but operating with incorrect values is. He has pointed out that if initialization is enforced, whether with a default or by coder, there is a good chance it will be initialized to the wrong value. Now if you really want to throw some sticks into the spokes, you would say that if the program crashes due to a null pointer, it is still likely that the programmer will just initialize/set the value to a "default" that still isn't valid just to get the program to continue to run.
Sep 27 2009
prev sibling next sibling parent language_fan <foo bar.com.invalid> writes:
Sun, 27 Sep 2009 16:47:51 +0000, Jesse Phillips thusly wrote:

 The thing is that memory safety is the only safety with code. In
 Walter's examples he very clearly showed that a crash is not unsafe, but
 operating with incorrect values is. He has pointed out that if
 initialization is enforced, whether with a default or by coder, there is
 a good chance it will be initialized to the wrong value.

Have you ever used functional languages? When you develop in Haskell or SML, how often you feel there is a good change something will be initialized to the wrong value? Can you show some statistics that show how unsafe this practice is? When the non-nullability is made optional, you *only* use it when you really know the initialization has a sane value, ok? Otherwise you can use the good old nullable references, right?
 Now if you really want to throw some sticks into the spokes, you would
 say that if the program crashes due to a null pointer, it is still
 likely that the programmer will just initialize/set the value to a
 "default" that still isn't valid just to get the program to continue to
 run.

Why should it crash in the first place? I hate crashes. You liek them? I can prove by structural induction that you do not like them when you can avoid crashes with static checking.
Sep 27 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 28 Sep 2009 01:31:44 +0400, Jeremie Pelletier <jeremiep gmail.com>  
wrote:

 Andrei Alexandrescu wrote:
 Jeremie Pelletier wrote:
 Is this Linux specific? what about other *nix systems, like BSD and  
 solaris?

Signal handler are standard to most *nix platforms since they're part of the posix C standard libraries, maybe some platforms will require a special handling but nothing impossible to do.

Walter and myself this morning, then I suggested him to post it but probably he is off email for a short while. Hopefully the community will find a solution to the issue he's raising. Let me post this: =================== Sean Kelly wrote: There's one minor problem with his code. It's not safe to throw an exception from a signal handler. Here's a quote from the POSIX spec at opengroup.org: "In order to prevent errors arising from interrupting non-reentrant function calls, applications should protect calls to these functions either by blocking the appropriate signals or through the use of some programmatic semaphore (see semget() , sem_init() , sem_open() , and so on). Note in particular that even the "safe" functions may modify errno; the signal-catching function, if not executing as an independent thread, may want to save and restore its value. Naturally, the same principles apply to the reentrancy of application routines and asynchronous data access. Note thatlongjmp() and siglongjmp() are not in the list of reentrant functions. This is because the code executing after longjmp() and siglongjmp() can call any unsafe functions with the same danger as calling those unsafe functions directly from the signal handler. Applications that use longjmp() andsiglongjmp() from within signal handlers require rigorous protection in order to be portable." If this were an acceptable approach it would have been in druntime ages ago :-) =================== Andrei

Yes but the segfault signal handler is not made to design code that can live with these exceptions, its just a feature to allow segfaults to be sent to the crash handler to get a backtrace dump. Even on windows while you can recover from access violations, its generally a bad idea to allow for bugs to be turned into features. Jeremie

Isn't this reason alone strong enough to encourage use of non-null references? And to implement them, since we don't the feature currently.
Sep 28 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 28 Sep 2009 15:35:07 -0400, Jesse Phillips  
<jesse.k.phillips+d gmail.com> wrote:

 language_fan Wrote:

 Have you ever used functional languages? When you develop in Haskell or
 SML, how often you feel there is a good change something will be
 initialized to the wrong value? Can you show some statistics that show
 how unsafe this practice is?

So isn't that the question? Does/can "default" (by human or machine) initialization create an incorrect state? If it does, do we continue to work as if nothing was wrong or crash? I don't know how often the initialization would be incorrect, but I don't think Walter is concerned with it's frequency, but that it is possible.

It creates an invalid, non-compiling program. It's simple: If initialization doesn't make sense, don't use non-nullable type. If initialization makes sense, use non-nullable type, initialize with the correct value. In case 1, we are back to current behavior, no problem (in Walter's eyes). In case 2, we eliminate any possible crash due to non-initialization. The subtle difference is the *default*. If non-null is the default, then you haphazardly write code like this: Object o; And you get a compile error "error, please initialize o or declare as Object? o". It makes you look at the line and say "hm... does it make sense to initialize there?" and you either put an initializer or you change it to Object? o; And move on. 90% of the time, you write something like: auto x = new Object(); and you don't even have to think about it. The compiler tells you when you got it wrong, and usually you then get it right after a moment of thought. At least, that has been my experience with C# (granted, it uses flow analysis, not non-nullable defaults). And I very seldom have null exception errors in my C# programs (they do happen, but of course, I get a nice stack trace). Compare that to D, where I build my program and get: # ./program_that_I_just_spent_1_week_writing_and_getting_to_compile Segmentation fault. # I'd rather spend an extra 5 minutes having D compiler complain about initialization than face the Segmentation fault error search. The thing is, I don't want D to cater to the moronic programmers that say "what? I need to initialize, ok, um.. here's a dummy object". I want it to cater to *me* and prevent *me* from making simple errors where I obviously should have known better, but accidentally left out the initializer. It's like the whole allowing object == null problem (coincidentally, resulting in the same dreaded error). Once Walter implemented the compiler that flagged them all, he discovered Phobos had several of those *obviously incorrect* statements. Hm... maybe he should do the same with this... Maybe someone who can hack the compiler can do it for him! Any takers? -Steve P.S. I never make the object == null mistake anymore. The compiler has trained me :)
Sep 28 2009
prev sibling next sibling parent reply language_fan <foo bar.com.invalid> writes:
Mon, 28 Sep 2009 15:35:07 -0400, Jesse Phillips thusly wrote:

 language_fan Wrote:
 
 Now if you really want to throw some sticks into the spokes, you
 would say that if the program crashes due to a null pointer, it is
 still likely that the programmer will just initialize/set the value
 to a "default" that still isn't valid just to get the program to
 continue to run.

Why should it crash in the first place? I hate crashes. You liek them? I can prove by structural induction that you do not like them when you can avoid crashes with static checking.

No one likes programs that crash, doesn't that mean it is an incorrect behavior though?
 Have you ever used functional languages? When you develop in Haskell or
 SML, how often you feel there is a good change something will be
 initialized to the wrong value? Can you show some statistics that show
 how unsafe this practice is?

So isn't that the question? Does/can "default" (by human or machine) initialization create an incorrect state? If it does, do we continue to work as if nothing was wrong or crash? I don't know how often the initialization would be incorrect, but I don't think Walter is concerned with it's frequency, but that it is possible.

Value types can be incorrectly initialized and nobody notices. E.g. int min; foreach(int value; list) if (value < min) min = value; Oops, you forgot to define a flag variable or initialize to int.min (if that is what you want). Even Java IDEs spot this error, but not D. The flow analysis helps me in tremendous ways - I can fix the error statically and boom, the software is suddenly again error free. Now I can tell you, in functional languages there is no other way. All initializations have to be correct, they are final, they are constants and they can be initialized incorrectly. But there are some tools that help in this. Functions can be automatically tested. Invariants, pre- and post-conditions can be set. Still, I can even bet they are much safer than D in every possible way. How is this possible? It really depends on your subjective opinion whether you want a program to segfault or spot a set of errors statically, and have illegally behaving non-crashing programs. I say FFFFFFFFFFUUUUUUUUUUU every time I experience a segfault. My hobby programs at home are not that critical, and at work the critical code is *proven* to be correct so no need to worry there.
Sep 28 2009
parent "Nick Sabalausky" <a a.a> writes:
"language_fan" <foo bar.com.invalid> wrote in message 
news:h9relp$1ebg$4 digitalmars.com...
 Mon, 28 Sep 2009 22:33:26 +0000, language_fan thusly wrote:

 Value types can be incorrectly initialized and nobody notices. E.g.

   int min;

   foreach(int value; list)
     if (value < min) min = value;

 Now I can tell you, in functional languages there is no other way. All
 initializations have to be correct, they are final, they are constants
 and they can be initialized incorrectly. But there are some tools that
 help in this. Functions can be automatically tested. Invariants, pre-
 and post-conditions can be set. Still, I can even bet they are much
 safer than D in every possible way. How is this possible?

For instance if I use the example given above, I write it like this in a functional language: find_min:: Ord a => [a] -> Maybe a find_min [] = Nothing find_min (h:t) = Just $ foldl min h t You can then use quickcheck to verify the result in some fancy way. I just cannot think of any way how you could crash programs written in this way. They are solid as a rock.

I'm not particulary accustomed to that sort of syntax. Am I correct in my analysis that that essentially does something like this?: // Assuming that: // 1. Variables of type void could be declared and had value 'void'. // 2. 'any(T,U,V)' was a "supertype" that can and must be one (and only one) of T, U, or V. immutable any(int,void) min(immutable any(int,void) a, immutable any(int,void) b) { static if( is(typeof(a) == void) && is(typeof(b) == void) ) return void; else static if( is(typeof(a) == int) && is(typeof(b) == void) ) return a; else static if( is(typeof(a) == void) && is(typeof(b) == int) ) return b; else return a<b? a : b; } immutable any(int,void) findMin(immutable int[] list) { static if(list.length == 0) return void; else return reduce!("min(a,b)")(list); // 'reduce' from phobos2 }
Sep 28 2009
prev sibling next sibling parent language_fan <foo bar.com.invalid> writes:
Mon, 28 Sep 2009 22:33:26 +0000, language_fan thusly wrote:

 Value types can be incorrectly initialized and nobody notices. E.g.
 
   int min;
 
   foreach(int value; list)
     if (value < min) min = value;

 Now I can tell you, in functional languages there is no other way. All
 initializations have to be correct, they are final, they are constants
 and they can be initialized incorrectly. But there are some tools that
 help in this. Functions can be automatically tested. Invariants, pre-
 and post-conditions can be set. Still, I can even bet they are much
 safer than D in every possible way. How is this possible?

For instance if I use the example given above, I write it like this in a functional language: find_min:: Ord a => [a] -> Maybe a find_min [] = Nothing find_min (h:t) = Just $ foldl min h t You can then use quickcheck to verify the result in some fancy way. I just cannot think of any way how you could crash programs written in this way. They are solid as a rock.
Sep 28 2009
prev sibling next sibling parent language_fan <foo bar.com.invalid> writes:
Mon, 28 Sep 2009 20:17:54 -0400, Nick Sabalausky thusly wrote:

 "language_fan" <foo bar.com.invalid> wrote in message
 news:h9relp$1ebg$4 digitalmars.com...
 Mon, 28 Sep 2009 22:33:26 +0000, language_fan thusly wrote:

 Value types can be incorrectly initialized and nobody notices. E.g.

   int min;

   foreach(int value; list)
     if (value < min) min = value;

 Now I can tell you, in functional languages there is no other way. All
 initializations have to be correct, they are final, they are constants
 and they can be initialized incorrectly. But there are some tools that
 help in this. Functions can be automatically tested. Invariants, pre-
 and post-conditions can be set. Still, I can even bet they are much
 safer than D in every possible way. How is this possible?

For instance if I use the example given above, I write it like this in a functional language: find_min:: Ord a => [a] -> Maybe a find_min [] = Nothing find_min (h:t) = Just $ foldl min h t You can then use quickcheck to verify the result in some fancy way. I just cannot think of any way how you could crash programs written in this way. They are solid as a rock.

I'm not particulary accustomed to that sort of syntax. Am I correct in my analysis that that essentially does something like this?: // Assuming that: // 1. Variables of type void could be declared and had value 'void'. // 2. 'any(T,U,V)' was a "supertype" that can and must be one (and only one) of T, U, or V. immutable any(int,void) min(immutable any(int,void) a, immutable any(int,void) b) { static if( is(typeof(a) == void) && is(typeof(b) == void) ) return void; else static if( is(typeof(a) == int) && is(typeof(b) == void) ) return a; else static if( is(typeof(a) == void) && is(typeof(b) == int) ) return b; else return a<b? a : b; } immutable any(int,void) findMin(immutable int[] list) { static if(list.length == 0) return void; else return reduce!("min(a,b)")(list); // 'reduce' from phobos2 }

Well to be honest, I thought I knew how to read D, but this is starting to look a bit scary. It looks like it does almost the same. I just used lists instead of arrays since they are the basic data type in functional code. Second, the find_min accepted any type that implements the 'Ord' class, i.e. supports the '<' relation, not only ints. I guess it could be solved by changing some pieces of code to look like this:
 immutable any(T,void) findMin(T)(immutable T[] list) {

My original idea was to just show that it is much harder to make similar kinds of errors with algebraic data types. I should have made a less generic :-)
Sep 28 2009
prev sibling next sibling parent Jesse Phillips <jessekphillips gmail.com> writes:
On Mon, 28 Sep 2009 16:01:10 -0400, Steven Schveighoffer wrote:

 On Mon, 28 Sep 2009 15:35:07 -0400, Jesse Phillips
 <jesse.k.phillips+d gmail.com> wrote:
 
 language_fan Wrote:

 Have you ever used functional languages? When you develop in Haskell
 or SML, how often you feel there is a good change something will be
 initialized to the wrong value? Can you show some statistics that show
 how unsafe this practice is?

So isn't that the question? Does/can "default" (by human or machine) initialization create an incorrect state? If it does, do we continue to work as if nothing was wrong or crash? I don't know how often the initialization would be incorrect, but I don't think Walter is concerned with it's frequency, but that it is possible.

It creates an invalid, non-compiling program.

No it doesn't, I'm not referring to null as the invalid state. float a; In this program it is invalid for 'a' to equal zero. If the compiler complains it is not initialized the programmer could fulfill the requirements. float a = 0; Hopefully the programmer knows that it shouldn't be 0, but a correction like this is still possible, the compiler won't complain and the program won't crash. Depending on what 'a' is controlling this could be very bad. I'm really not arguing either way, I'm trying to make it clear since no one seems to be getting Walters positions. BTW, what is it with people writing SomeObject foo; If they believe the compiler should enforce explicit initialization? If you think an object should always be initialized at declaration don't write a statement that only declares and don't set a reference to null.
Sep 28 2009
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 28 Sep 2009 21:43:20 -0400, Jesse Phillips  
<jessekphillips gmail.com> wrote:

 On Mon, 28 Sep 2009 16:01:10 -0400, Steven Schveighoffer wrote:

 On Mon, 28 Sep 2009 15:35:07 -0400, Jesse Phillips
 <jesse.k.phillips+d gmail.com> wrote:

 language_fan Wrote:

 Have you ever used functional languages? When you develop in Haskell
 or SML, how often you feel there is a good change something will be
 initialized to the wrong value? Can you show some statistics that show
 how unsafe this practice is?

So isn't that the question? Does/can "default" (by human or machine) initialization create an incorrect state? If it does, do we continue to work as if nothing was wrong or crash? I don't know how often the initialization would be incorrect, but I don't think Walter is concerned with it's frequency, but that it is possible.

It creates an invalid, non-compiling program.

No it doesn't, I'm not referring to null as the invalid state. float a; In this program it is invalid for 'a' to equal zero. If the compiler complains it is not initialized the programmer could fulfill the requirements.

I am not arguing for floats (or any value types) to be required to be initialized.
 float a = 0;

 Hopefully the programmer knows that it shouldn't be 0, but a correction
 like this is still possible, the compiler won't complain and the program
 won't crash. Depending on what 'a' is controlling this could be very bad.

 I'm really not arguing either way, I'm trying to make it clear since no
 one seems to be getting Walters positions.

I get his arguments, but I think they are based on an non-analagous situation. I think his arguments are based on his experience with compilers or corporate rules requiring what you were saying -- initializing all variables. We don't want that, we just want the developer to clarify "this variable is initialized" or "this variable is ok to be uninitialized".
 BTW, what is it with people writing

 SomeObject foo;

 If they believe the compiler should enforce explicit initialization? If
 you think an object should always be initialized at declaration don't
 write a statement that only declares and don't set a reference to null.

It's more complicated than that. For example, you *have* to write this for objects that are a part of aggregates: class SomeOtherObject { SomeObject foo; // can't initialize here, because you need to use the heap, and compiler only allows CTFE initialization. this() { foo = new SomeObject(); // here is where the initialization sits. } } This is ok, but what if the initialization is buried, or you add another variable to a large class and forgot to add the initializer to the constructor? And there *are* cases where you *don't* want to initialize, that should also be possible: SomeObject? foo; If this wasn't part of the proposal, I'd agree with Walter 100%, but it gives the lazy programmer an easy way to default to the current behavior (easier than building some dummy object), so given the lazy nature of said programmer, they are more likely to do this than assign a dummy value. -Steve
Sep 29 2009
prev sibling next sibling parent reply grauzone <none example.net> writes:
Walter Bright wrote:
 It is exactly analogous to a null pointer exception. And it's darned 
 useful.

On Linux, it just generates a segfault. And then you have no idea where the program went wrong. dmd outputting incorrect debugging information (so you have troubles using gdb or even addr2line) doesn't really help here. Not so useful.
Sep 26 2009
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
grauzone wrote:
 Walter Bright wrote:
 It is exactly analogous to a null pointer exception. And it's darned 
 useful.

On Linux, it just generates a segfault. And then you have no idea where the program went wrong. dmd outputting incorrect debugging information (so you have troubles using gdb or even addr2line) doesn't really help here.

Then the problem is incorrect dwarf output, not null pointers.
 Not so useful.

It's *still* far more useful than generating corrupt output and pretending all is ok.
Sep 26 2009
next sibling parent reply grauzone <none example.net> writes:
Walter Bright wrote:
 grauzone wrote:
 Walter Bright wrote:
 It is exactly analogous to a null pointer exception. And it's darned 
 useful.

On Linux, it just generates a segfault. And then you have no idea where the program went wrong. dmd outputting incorrect debugging information (so you have troubles using gdb or even addr2line) doesn't really help here.

Then the problem is incorrect dwarf output, not null pointers.

Indeed. I was just commenting in how badly the current D implementation handles it, and how useless the result is.
 Not so useful.

It's *still* far more useful than generating corrupt output and pretending all is ok.

But nobody argues in favor of that?
Sep 26 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
grauzone wrote:
 Walter Bright wrote:
 grauzone wrote:
 Not so useful.

It's *still* far more useful than generating corrupt output and pretending all is ok.

But nobody argues in favor of that?

It's implicit in the argument that some default should be used instead. That's what I'm trying to point out. Even forcing an explicit initializer doesn't actually solve the problem - my experience with such features is programmers simply insert any old value to get the code to pass the compiler, even programmers who know it's a bad idea do it anyway. It's a lot like why exception-specifications are a failure. See Bruce Eckel's essay on it: http://www.mindview.net/Etc/Discussions/CheckedExceptions
Sep 26 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 grauzone wrote:
 Walter Bright wrote:
 grauzone wrote:
 Not so useful.

It's *still* far more useful than generating corrupt output and pretending all is ok.

But nobody argues in favor of that?

It's implicit in the argument that some default should be used instead. That's what I'm trying to point out. Even forcing an explicit initializer doesn't actually solve the problem - my experience with such features is programmers simply insert any old value to get the code to pass the compiler, even programmers who know it's a bad idea do it anyway.

I think you're starting to be wrong at the point where you don't realize that many bugs come from references that people have forgotten to initialize. Once you acknowledge those, you will start to realize that a reference that must compulsively be initialized is valuable. You think from another perspective: you strongly believe that *most* of the time you can't or shouldn't initialize a reference. Your code in Phobos reflects that perspective. In the RegExp class, for example, you very often define a variable at the top of a long function and initialize it halfway through it. I trivially replaced such code with the correct code that defines symbols just where they're needed. Andrei
Sep 26 2009
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Even forcing an explicit initializer doesn't actually solve the 
 problem - my experience with such features is programmers simply 
 insert any old value to get the code to pass the compiler, even 
 programmers who know it's a bad idea do it anyway.

I think you're starting to be wrong at the point where you don't realize that many bugs come from references that people have forgotten to initialize. Once you acknowledge those, you will start to realize that a reference that must compulsively be initialized is valuable.

The problem is it's worse to force people to provide an initializer. Most of the time that will work out ok, but the one silent bad value producing silent bad output overbalances all of it. Null pointer dereferences do not produce bad output that can be overlooked. It isn't a theoretical problem with providing bad initializers just to shut the compiler up. I have seen it in the wild every time some manager required that code compile without warnings and the compiler warned about no initializer. I'm very much a fan of increasing D's ability to detect and head off common mistakes, but it's really easy to tip into seducing programmers into writing bad code in order to avoid an overly nagging compiler. There's the other problem of how to represent an "empty" value. You have to create a special object that then you have to either test for explicitly, or that has member functions that throw. You're no better off with that, and arguably worse off.
 You think from another perspective: you strongly believe that *most* of 
 the time you can't or shouldn't initialize a reference. Your code in 
 Phobos reflects that perspective. In the RegExp class, for example, you 
 very often define a variable at the top of a long function and 
 initialize it halfway through it. I trivially replaced such code with 
 the correct code that defines symbols just where they're needed.

That style doesn't reflect anything more than my old C habits which require declarations before any statements. I know it's bad style and do it less and less over time.
Sep 26 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Even forcing an explicit initializer doesn't actually solve the 
 problem - my experience with such features is programmers simply 
 insert any old value to get the code to pass the compiler, even 
 programmers who know it's a bad idea do it anyway.

I think you're starting to be wrong at the point where you don't realize that many bugs come from references that people have forgotten to initialize. Once you acknowledge those, you will start to realize that a reference that must compulsively be initialized is valuable.

The problem is it's worse to force people to provide an initializer.

You're not forcing. You just change the default. Really, it's *exactly* the same deal as with = void that you're so happy about. Andrei
Sep 26 2009
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Even forcing an explicit initializer doesn't actually solve the 
 problem - my experience with such features is programmers simply 
 insert any old value to get the code to pass the compiler, even 
 programmers who know it's a bad idea do it anyway.

I think you're starting to be wrong at the point where you don't realize that many bugs come from references that people have forgotten to initialize. Once you acknowledge those, you will start to realize that a reference that must compulsively be initialized is valuable.

The problem is it's worse to force people to provide an initializer.

You aren't forcing them. They decide for themselves. They determine whether it's appropriate for a particular variable to be null. You can achieve the same goal through contracts. However, this is much more verbose -- enough so that you'll only add these contracts when hunting down a bug. And if you have an array of things
 It isn't a theoretical problem with providing bad initializers just to 
 shut the compiler up. I have seen it in the wild every time some manager 
 required that code compile without warnings and the compiler warned 
 about no initializer.

C# requires that every variable be initialized before use. You know how often I get such an error? Maybe once for every 100 hours of coding. It's mainly for cases where I expect an integer to be initialized to 0 and it's not. You know how often I provide a bad initializer to shut the compiler up? Never. This is partially because C#'s compiler has good flow analysis. It's mostly because: - I declare variables where I use them, not beforehand. - I often declare variables via IDE commands -- I write the code to fetch or calculate a value and assign it to a variable that doesn't exist, and the IDE fills in the type and declares it in the correct place. - I usually don't have more than four or five local variables in a function (often no more than one or two). Out of 300KLOC, there are a few dozen functions that break this rule. DMDFE functions are often long, complex, and have many local variables. I see how this would conflict with your coding style. You would have to add a few question marks for each function, and then you'd be done. DMDFE is ~60KLOC, but you could probably switch it over to this type system without structural changes to any function in a couple days.
Sep 27 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
language_fan wrote:
 Maybe Walter has not yet transitioned from the good olde Pascal/C style 
 programming to the C++/D/Java style?

Heh, there's still a Fortran influence in my code <g>.
Sep 26 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Walter Bright wrote:
 language_fan wrote:
 Maybe Walter has not yet transitioned from the good olde Pascal/C 
 style programming to the C++/D/Java style?

Heh, there's still a Fortran influence in my code <g>.

This may be a good time to ask about how these variables which can be declared anywhere in the function scope are implemented. void bar(bool foo) { if(foo) { int a = 1; ... } else { int a = 2; ... } } is the stack frame using two ints, or is the compiler seeing only one? I never bothered to check it out and just declared 'int a = void;' at the beginning of the routine to keep the stack frames as small as possible.
Sep 26 2009
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Jeremie Pelletier wrote:
 This may be a good time to ask about how these variables which can be 
 declared anywhere in the function scope are implemented.
 
 void bar(bool foo) {
     if(foo) {
         int a = 1;
         ...
     }
     else {
         int a = 2;
         ...
     }
 
 }
 
 is the stack frame using two ints, or is the compiler seeing only one? I 
 never bothered to check it out and just declared 'int a = void;' at the 
 beginning of the routine to keep the stack frames as small as possible.

They are completely independent variables. One may get assigned to a register, and not the other.
Sep 27 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Walter Bright wrote:
 Jeremie Pelletier wrote:
 This may be a good time to ask about how these variables which can be 
 declared anywhere in the function scope are implemented.

 void bar(bool foo) {
     if(foo) {
         int a = 1;
         ...
     }
     else {
         int a = 2;
         ...
     }

 }

 is the stack frame using two ints, or is the compiler seeing only one? 
 I never bothered to check it out and just declared 'int a = void;' at 
 the beginning of the routine to keep the stack frames as small as 
 possible.

They are completely independent variables. One may get assigned to a register, and not the other.

Ok, that's what I thought, so the good old C way of declaring variables at the top is not a bad thing yet :)
Sep 27 2009
parent Rainer Deyke <rainerd eldwood.com> writes:
Jeremie Pelletier wrote:
 Walter Bright wrote:
 They are completely independent variables. One may get assigned to a
 register, and not the other.

Ok, that's what I thought, so the good old C way of declaring variables at the top is not a bad thing yet :)

Strange how you can look at the evidence and arrive at exactly the wrong conclusion. Declaring variables as close as possible to where they are used can reduce stack usage, and never increases it. -- Rainer Deyke - rainerd eldwood.com
Sep 27 2009
prev sibling parent reply Rainer Deyke <rainerd eldwood.com> writes:
Jeremie Pelletier wrote:
 void bar(bool foo) {
     if(foo) {
         int a = 1;
         ...
     }
     else {
         int a = 2;
         ...
     }
 
 }
 
 is the stack frame using two ints, or is the compiler seeing only one? I
 never bothered to check it out and just declared 'int a = void;' at the
 beginning of the routine to keep the stack frames as small as possible.

OT, but declaring the variable at the top of the function increases stack size. Example with changed variable names: void bar(bool foo) { if (foo) { int a = 1; } else { int b = 2; } int c = 3; } In this example, there are clearly three different (and differently named) variables, but their lifetimes do not overlap. Only one variable can exist at a time, therefore the compiler only needs to allocate space for one variable. Now, if you move your declaration to the top: void bar(bool foo) { int a = void; if (foo) { a = 1; } else { a = 2; // Reuse variable. } int c = 3; } You now only have two variables, but both of them coexist at the end of the function. Unless the compiler applies a clever optimization, the compiler is now forced to allocate space for two variables on the stack. -- Rainer Deyke - rainerd eldwood.com
Sep 27 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Rainer Deyke wrote:
 OT, but declaring the variable at the top of the function increases
 stack size.
 
 Example with changed variable names:
 
   void bar(bool foo) {
     if (foo) {
       int a = 1;
     } else {
       int b = 2;
     }
     int c = 3;
   }
 
 In this example, there are clearly three different (and differently
 named) variables, but their lifetimes do not overlap.  Only one variable
 can exist at a time, therefore the compiler only needs to allocate space
 for one variable.  Now, if you move your declaration to the top:
 
   void bar(bool foo) {
     int a = void;
     if (foo) {
       a = 1;
     } else {
       a = 2; // Reuse variable.
     }
     int c = 3;
   }
 
 You now only have two variables, but both of them coexist at the end of
 the function.  Unless the compiler applies a clever optimization, the
 compiler is now forced to allocate space for two variables on the stack.

Not necessarily. The optimizer uses a technique called "live range analysis" to determine if two variables have non-overlapping ranges. It uses this for register assignment, but it could just as well be used for minimizing stack usage.
Sep 27 2009
parent reply Rainer Deyke <rainerd eldwood.com> writes:
Walter Bright wrote:
   void bar(bool foo) {
     int a = void;
     if (foo) {
       a = 1;
     } else {
       a = 2; // Reuse variable.
     }
     int c = 3;
   }

 You now only have two variables, but both of them coexist at the end of
 the function.  Unless the compiler applies a clever optimization, the
 compiler is now forced to allocate space for two variables on the stack.

Not necessarily. The optimizer uses a technique called "live range analysis" to determine if two variables have non-overlapping ranges. It uses this for register assignment, but it could just as well be used for minimizing stack usage.

That's the optimization I was referring to. It works for ints, but not for RAII types. It also doesn't (necessarily) work if you reorder the function: void bar(bool foo) { int a = void; int c = 3; if (foo) { a = 1; } else { a = 2; // Reuse variable. } } Of course, a good optimizer can still reorder the declarations in this case, or even eliminate the whole function body (since it doesn't do anything). -- Rainer Deyke - rainerd eldwood.com
Sep 27 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Rainer Deyke:

 Of course, a good optimizer can still reorder the declarations in this
 case, or even eliminate the whole function body (since it doesn't do
 anything).

LLVM has a good optimizer. If you try the LLVM demo on C code with LTO activated: http://llvm.org/demo/index.cgi This C code: void bar(int foo) { int a; int c = 3; if (foo) { a = 1; } else { a = 2; } } Produces an useful warining: /tmp/webcompile/_16254_0.c:3: warning: unused variable 'c' And an empty function: define void bar(i32 %foo) nounwind readnone { entry: ret void } Bye, bearophile
Sep 27 2009
prev sibling parent reply Yigal Chripun <yigal100 gmail.com> writes:
On 27/09/2009 00:51, Walter Bright wrote:
 grauzone wrote:
 Walter Bright wrote:
 It is exactly analogous to a null pointer exception. And it's darned
 useful.

On Linux, it just generates a segfault. And then you have no idea where the program went wrong. dmd outputting incorrect debugging information (so you have troubles using gdb or even addr2line) doesn't really help here.

Then the problem is incorrect dwarf output, not null pointers.
 Not so useful.

It's *still* far more useful than generating corrupt output and pretending all is ok.

An exception trace is *far* better than a segfault and that does not require null values. what's better? a) auto a = new Class; // returns null (out of memory) a.foo = 5; // segfault since a is null OR b) auto a = new Class; // throws an out of memory exception a.foo = 5; // doesn't even reach here no one here argues for option c where a holds garbage and the program generates corrupt output.
Sep 26 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Yigal Chripun wrote:
 An exception trace is *far* better than a segfault and that does not 
 require null values.

Seg faults are exceptions, too. You can even catch them (on windows)!
Sep 26 2009
next sibling parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Walter Bright wrote:
 Yigal Chripun wrote:
 An exception trace is *far* better than a segfault and that does not 
 require null values.

Seg faults are exceptions, too. You can even catch them (on windows)!

Walter, check the crash handler I submitted to D.announce, it has signal handlers on linux to convert segfaults into D exception objects and throw them so the code can unwind properly and even catch it. It has made my life so much easier, I barely need to run within a debugger anymore for most crashes. I don't know enough of phobos and druntime to port it, but its under a public domain license so anyone is free to do it! </shameless plug>
Sep 26 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jeremie Pelletier wrote:
 Walter Bright wrote:
 Yigal Chripun wrote:
 An exception trace is *far* better than a segfault and that does not 
 require null values.

Seg faults are exceptions, too. You can even catch them (on windows)!

Walter, check the crash handler I submitted to D.announce, it has signal handlers on linux to convert segfaults into D exception objects and throw them so the code can unwind properly and even catch it. It has made my life so much easier, I barely need to run within a debugger anymore for most crashes. I don't know enough of phobos and druntime to port it, but its under a public domain license so anyone is free to do it! </shameless plug>

I think that's great. Walter, Sean, please let's look into this. Andrei
Sep 27 2009
prev sibling parent Yigal Chripun <yigal100 gmail.com> writes:
On 27/09/2009 03:35, Walter Bright wrote:
 Yigal Chripun wrote:
 An exception trace is *far* better than a segfault and that does not
 require null values.

Seg faults are exceptions, too. You can even catch them (on windows)!

No, segfaults are *NOT* exceptions. the setup you mention is windows only as Andrei said and for *nix is irrelevant. I develop on Unix (solaris) and segfault are a pain to deal with. furthermore, even *IF* segfaults were transformed in D to exceptions that still doesn't make them proper exceptions because true exceptions are thrown at the place of the error which is not true for segfaults. T foo() { T t; ...logic if (error) return null; return t; } now, foo is buried deep in a lib. user code has: T t = someLib.foo(); ... logic t.fubar = 4; //segfault t is null how is it better to segfault in t.fubar as opposed to throw an exception inside foo?
Sep 27 2009
prev sibling parent language_fan <foo bar.com.invalid> writes:
Sat, 26 Sep 2009 18:38:56 -0500, Andrei Alexandrescu thusly wrote:

 Your code in
 Phobos reflects that perspective. In the RegExp class, for example, you
 very often define a variable at the top of a long function and
 initialize it halfway through it. I trivially replaced such code with
 the correct code that defines symbols just where they're needed.

Maybe Walter has not yet transitioned from the good olde Pascal/C style programming to the C++/D/Java style?
Sep 26 2009
prev sibling next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Sun, 27 Sep 2009 01:08:32 +0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Denis Koroskin wrote:
  > On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright
  > <newshound1 digitalmars.com> wrote:
  >> D has borrowed ideas from many different languages. The trick is to
  >> take the good stuff and avoid their mistakes <g>.
  >
  > How about this one:
  >  
 http://sadekdrobi.com/2008/12/22/null-references-the-billion-dollar-mistake/  
  >
  >
  > :)

 I think he's wrong.

 Getting rid of null references is like solving the problem of dead  
 canaries in the coal mines by replacing them with stuffed toys.

 It all depends on what you prefer a program to do when it encounters a  
 program bug:

 1. Immediately stop and produce an indication that the program failed

 2. Soldier on and silently produce garbage output

 I prefer (1).

 Consider the humble int. There is no invalid value such that referencing  
 the invalid value will cause a seg fault. One case is an uninitialized  
 int is set to garbage, and erratic results follow. Another is that (in  
 D) ints are default initialized to 0. 0 may or may not be what the logic  
 of the program requires, and if it isn't, again, silently bad results  
 follow.

 Consider also the NaN value that floats are default initialized to. This  
 has the nice characteristic of you know your results are bad if they are  
 NaN. But it has the bad characteristic that you don't know where the NaN  
 came from. Don corrected this by submitting a patch that enables the  
 program to throw an exception upon trying to use a NaN. Then, you know  
 exactly where your program went wrong.

 It is exactly analogous to a null pointer exception. And it's darned  
 useful.

I don't understand you. You say you prefer 1, but describe the path D currently takes, which is 2! dchar d; // not initialized writeln(d); // Soldier on and silently produce garbage output I don't see at all how is it related to a non-null default. Non-null default is all about avoiding erroneous situations, enforcing program correctness and stability. You solve an entire class of problem: NullPointerException.
Sep 26 2009
next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Denis Koroskin wrote:
 I don't understand you. You say you prefer 1, but describe the path D 
 currently takes, which is 2!
 
 dchar d; // not initialized
 writeln(d); // Soldier on and silently produce garbage output

d is initialized to the "invalid" unicode bit pattern of 0xFFFF. You'll see this if you put a printf in. The bug here is in writeln failing to recognize the invalid value. http://d.puremagic.com/issues/show_bug.cgi?id=3347
 I don't see at all how is it related to a non-null default.

Both are attempts to use invalid values.
 Non-null default is all about avoiding erroneous situations, enforcing 
 program correctness and stability. You solve an entire class of problem: 
 NullPointerException.

No, it just papers over the problem. The actual problem is the user failed to initialize it to a value that makes sense for his program. Setting it to a default value does not solve the problem. Let's say the language is changed so that: int i; is now illegal, and generates a compile time error message. What do you suggest the user do? int i = 0; The compiler now accepts the code. But is 0 the correct value for the program? I guarantee you that programmers will simply insert "= 0" to get it to pass compilation, even if 0 is an invalid value for i for the logic of the program. (I guarantee it because I've seen it over and over, and the bugs that result.) The point is, there really is no correct answer to the question "what should variables be default initialized to that will work correctly"? The best we can do is default initialize it to a NaN value, and then we can track usages of NaNs and know then that we have a program logic bug. A null reference is an ideal NaN value because attempts to use it will cause an immediate program halt with a findable indication of where the program logic went wrong. There's no avoiding it or pretending it didn't happen. There's no silently corrupt program output.
Sep 26 2009
prev sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sun, 27 Sep 2009 02:03:40 +0400, Walter Bright
<newshound1 digitalmars.com> wrote:

 Denis Koroskin wrote:
 I don't understand you. You say you prefer 1, but describe the path D  
 currently takes, which is 2!
  dchar d; // not initialized
 writeln(d); // Soldier on and silently produce garbage output

d is initialized to the "invalid" unicode bit pattern of 0xFFFF. You'll see this if you put a printf in. The bug here is in writeln failing to recognize the invalid value. http://d.puremagic.com/issues/show_bug.cgi?id=3347

Change dchar to float or an int. It's still not initialized (well, default-initialized to some garbage, which may or may not be okay for a programmer).
 I don't see at all how is it related to a non-null default.

Both are attempts to use invalid values.

No.
 Non-null default is all about avoiding erroneous situations, enforcing  
 program correctness and stability. You solve an entire class of  
 problem: NullPointerException.

No, it just papers over the problem. The actual problem is the user failed to initialize it to a value that makes sense for his program. Setting it to a default value does not solve the problem. Let's say the language is changed so that: int i; is now illegal, and generates a compile time error message. What do you suggest the user do? int i = 0;

1) We are talking about non-null *references* here. 2) I'd suggest user to initialize it to a proper value. "int i;" is not the whole function, is it? All I say is "i" should be initialized before accessed, and that fact should be statically enforced by a compiler.
 The compiler now accepts the code. But is 0 the correct value for the  
 program? I guarantee you that programmers will simply insert "= 0" to  
 get it to pass compilation, even if 0 is an invalid value for i for the  
 logic of the program. (I guarantee it because I've seen it over and  
 over, and the bugs that result.)

This is absolutely irrelevant to non-null reference types. Programmer can't write "Object o = null;" to cheat on the type system.
Sep 26 2009
prev sibling next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Walter Bright Wrote:

 Denis Koroskin wrote:
  > On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright
  > <newshound1 digitalmars.com> wrote:
  >> D has borrowed ideas from many different languages. The trick is to
  >> take the good stuff and avoid their mistakes <g>.
  >
  > How about this one:
  > 
 http://sadekdrobi.com/2008/12/22/null-references-the-billion-dollar-mistake/ 
 
  >
  >
  > :)
 
 I think he's wrong.
 
 Getting rid of null references is like solving the problem of dead 
 canaries in the coal mines by replacing them with stuffed toys.
 
 It all depends on what you prefer a program to do when it encounters a 
 program bug:

What do you define as a bug? Dereferencing a null pointer? Passing a null reference into a function that does not expect it? Storing a null reference in a variable whose later use does not expect one? Unexpectedly getting a null back from a function? ... You seem to be using the first definition which is meaningless to me. What I need to know is how the null ended up where it was unexpected.
 
 1. Immediately stop and produce an indication that the program failed

By most definitions above, D does not do this. I have more to say, but ran out of time to type this. I'll add more later...
Sep 26 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
Jason House wrote:
 Walter Bright Wrote:
 
 Denis Koroskin wrote:
 On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright 
 <newshound1 digitalmars.com> wrote:
 D has borrowed ideas from many different languages. The trick
 is to take the good stuff and avoid their mistakes <g>.

How about this one:

 
 
 :)

I think he's wrong. Getting rid of null references is like solving the problem of dead canaries in the coal mines by replacing them with stuffed toys. It all depends on what you prefer a program to do when it encounters a program bug:

What do you define as a bug?

The program doing something it was not deliberately programmed to do.
Sep 26 2009
prev sibling next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Walter Bright wrote:
 ...
 
 It all depends on what you prefer a program to do when it encounters a
 program bug:
 
 1. Immediately stop and produce an indication that the program failed
 
 2. Soldier on and silently produce garbage output
 
 I prefer (1).
 
 ...

*sigh* Walter, I really admire you as a programmer. But this is about the most blatant strawman argument I've seen in a while. Firstly, as others have indicated, the whole point of non-null would be to REQUIRE initialisation to something useful. "But the user will just assign to something useless to get around that!" You mean like how everyone wraps every call in try{...}catch(Exception e){} to shut the damn exceptions up? Or uses pointer arithmetic and casts to get at those pesky private members? If someone is actively trying to break the type system, it's their goddamn fault! Honestly, I don't care about the hacks they employ to defeat the system because they're going to go around blindly shooting themselves in the foot no matter what they do. It's like arguing that safety rails are pointless because people can just jump over them. BESIDES, if they fall off, you get this really loud "crunch" followed by a shower of blood; then it's OBVIOUS that something's wrong. And what about the people who AREN'T complete idiots, who maybe sometimes just accidentally trip and would quite welcome a safety rail there? Besides which, the non-null proposal is always accompanied by the proposal to add nullable object references as T? (or whatever; the syntax is irrelevant at this point). If a programmer really wants a null-initialised object reference, which is she more likely to do? class NullFoo : Foo { void member1() { throw new Exception("NULL!"); } void member2() { throw new Exception("NULL!"); } ... } Foo bar = new NullFoo; or Foo? bar; Since the reason they're trying to circumvent the non-null protection is because of laziness, I assert they're far more likely to go with the second than the first. And it's STILL better because you couldn't implicitly cast between Foo? and Foo. They would HAVE to insert an explicit cast or check. Foo quxx = enforceNN(bar); Finally, let me re-post something I wrote the last time this came up:
 The problem with null dereference problems isn't knowing that they're
 there: that's the easy part.  You helpfully get an exception to the
 face when that happens. The hard part is figuring out *where* the
 problem originally occurred. It's not when the exception is thrown
 that's the issue; it's the point at which you placed a null reference
 in a slot where you shouldn't have.

 Yes, we have invariants and contracts; but inevitably you're going to
 forget one, and it's that one slip-up that's going to bite you in the
 rear.

 One could use roughly the same argument for non-null references as for
 const: you could document it, but documentation is inevitably wrong or
 out of date.  :P

Sep 26 2009
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Daniel Keep wrote:
 "But the user will just assign to something useless to get around that!"
 
 You mean like how everyone wraps every call in try{...}catch(Exception
 e){} to shut the damn exceptions up?

They do just that in Java because of the checked-exceptions thing. I have a reference to Bruce Eckel's essay on it somewhere in this thread. The observation in the article was it wasn't just moron idiot programmers doing this. It was the guru programmers doing it, all the while knowing it was the wrong thing to do. The end result was the feature actively created the very problems it was designed to prevent.
 Or uses pointer arithmetic and
 casts to get at those pesky private members?

That's entirely different, because privacy is selected by the programmer, not the language. I don't have any issue with a user-defined type that is non-nullable (Andrei has designed a type constructor for that).
 If someone is actively trying to break the type system, it's their
 goddamn fault!  Honestly, I don't care about the hacks they employ to
 defeat the system because they're going to go around blindly shooting
 themselves in the foot no matter what they do.

True, but it's still not a good idea to design a language feature that winds up, in reality, encouraging bad programming practice. It encourages bad practice in a way that is really, really hard to detect in a code review. I like programming mistakes to be obvious, not subtle. There's nothing subtle about a null pointer exception. There's plenty subtle about the wrong default value.
 And what about the people who AREN'T complete idiots, who maybe
 sometimes just accidentally trip and would quite welcome a safety rail
 there?

Null pointer seg faults *are* a safety rail. They keep an errant program from causing further damage.
 Finally, let me re-post something I wrote the last time this came up:
 
 The problem with null dereference problems isn't knowing that they're
 there: that's the easy part.  You helpfully get an exception to the
 face when that happens. The hard part is figuring out *where* the
 problem originally occurred. It's not when the exception is thrown
 that's the issue; it's the point at which you placed a null reference
 in a slot where you shouldn't have.


It's a lot harder to track down a bug when the bad initial value gets combined with a lot of other data first. The only time I've had a problem finding where a null came from (because they tend to fail very close to their initialization point) is when the null was caused by another memory corruption problem. Non-nullable references won't mitigate that.
Sep 26 2009
next sibling parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
Walter Bright wrote:
 Daniel Keep wrote:
 "But the user will just assign to something useless to get around that!"

 You mean like how everyone wraps every call in try{...}catch(Exception
 e){} to shut the damn exceptions up?

They do just that in Java because of the checked-exceptions thing. I have a reference to Bruce Eckel's essay on it somewhere in this thread. The observation in the article was it wasn't just moron idiot programmers doing this. It was the guru programmers doing it, all the while knowing it was the wrong thing to do. The end result was the feature actively created the very problems it was designed to prevent.
 Or uses pointer arithmetic and
 casts to get at those pesky private members?

That's entirely different, because privacy is selected by the programmer, not the language. I don't have any issue with a user-defined type that is non-nullable (Andrei has designed a type constructor for that).
 If someone is actively trying to break the type system, it's their
 goddamn fault!  Honestly, I don't care about the hacks they employ to
 defeat the system because they're going to go around blindly shooting
 themselves in the foot no matter what they do.

True, but it's still not a good idea to design a language feature that winds up, in reality, encouraging bad programming practice. It encourages bad practice in a way that is really, really hard to detect in a code review. I like programming mistakes to be obvious, not subtle. There's nothing subtle about a null pointer exception. There's plenty subtle about the wrong default value.
 And what about the people who AREN'T complete idiots, who maybe
 sometimes just accidentally trip and would quite welcome a safety rail
 there?

Null pointer seg faults *are* a safety rail. They keep an errant program from causing further damage.

Null pointer seg faults *not being able to happen* are much more safe. :)
Sep 26 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Ary Borenszweig wrote:
 Walter Bright wrote:
 Daniel Keep wrote:
 "But the user will just assign to something useless to get around that!"

 You mean like how everyone wraps every call in try{...}catch(Exception
 e){} to shut the damn exceptions up?

They do just that in Java because of the checked-exceptions thing. I have a reference to Bruce Eckel's essay on it somewhere in this thread. The observation in the article was it wasn't just moron idiot programmers doing this. It was the guru programmers doing it, all the while knowing it was the wrong thing to do. The end result was the feature actively created the very problems it was designed to prevent.
 Or uses pointer arithmetic and
 casts to get at those pesky private members?

That's entirely different, because privacy is selected by the programmer, not the language. I don't have any issue with a user-defined type that is non-nullable (Andrei has designed a type constructor for that).
 If someone is actively trying to break the type system, it's their
 goddamn fault!  Honestly, I don't care about the hacks they employ to
 defeat the system because they're going to go around blindly shooting
 themselves in the foot no matter what they do.

True, but it's still not a good idea to design a language feature that winds up, in reality, encouraging bad programming practice. It encourages bad practice in a way that is really, really hard to detect in a code review. I like programming mistakes to be obvious, not subtle. There's nothing subtle about a null pointer exception. There's plenty subtle about the wrong default value.
 And what about the people who AREN'T complete idiots, who maybe
 sometimes just accidentally trip and would quite welcome a safety rail
 there?

Null pointer seg faults *are* a safety rail. They keep an errant program from causing further damage.

Null pointer seg faults *not being able to happen* are much more safe. :)

There is no such thing as "not being able to happen" :) Object thisCannotPossiblyBeNullInAnyWayWhatsoever = cast(Object)null; I seem to be the only one who sees Walter's side of things in this thread :o) For nonnulls to *really* be enforcable you'd have to get rid of the cast system entirely.
Sep 26 2009
next sibling parent Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Jeremie Pelletier wrote:
 Ary Borenszweig wrote:
 Walter Bright wrote:
 Daniel Keep wrote:
 "But the user will just assign to something useless to get around 
 that!"

 You mean like how everyone wraps every call in try{...}catch(Exception
 e){} to shut the damn exceptions up?

They do just that in Java because of the checked-exceptions thing. I have a reference to Bruce Eckel's essay on it somewhere in this thread. The observation in the article was it wasn't just moron idiot programmers doing this. It was the guru programmers doing it, all the while knowing it was the wrong thing to do. The end result was the feature actively created the very problems it was designed to prevent.
 Or uses pointer arithmetic and
 casts to get at those pesky private members?

That's entirely different, because privacy is selected by the programmer, not the language. I don't have any issue with a user-defined type that is non-nullable (Andrei has designed a type constructor for that).
 If someone is actively trying to break the type system, it's their
 goddamn fault!  Honestly, I don't care about the hacks they employ to
 defeat the system because they're going to go around blindly shooting
 themselves in the foot no matter what they do.

True, but it's still not a good idea to design a language feature that winds up, in reality, encouraging bad programming practice. It encourages bad practice in a way that is really, really hard to detect in a code review. I like programming mistakes to be obvious, not subtle. There's nothing subtle about a null pointer exception. There's plenty subtle about the wrong default value.
 And what about the people who AREN'T complete idiots, who maybe
 sometimes just accidentally trip and would quite welcome a safety rail
 there?

Null pointer seg faults *are* a safety rail. They keep an errant program from causing further damage.

Null pointer seg faults *not being able to happen* are much more safe. :)

There is no such thing as "not being able to happen" :) Object thisCannotPossiblyBeNullInAnyWayWhatsoever = cast(Object)null; I seem to be the only one who sees Walter's side of things in this thread :o) For nonnulls to *really* be enforcable you'd have to get rid of the cast system entirely.

It's a systems programming language. You can screw up the type system if you really want to. But then it would still fall back to the lovely segfault. If you don't screw with it, you're safe with static checking. It's a clean win. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Sep 26 2009
prev sibling next sibling parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
Jeremie Pelletier wrote:
 Ary Borenszweig wrote:
 Walter Bright wrote:
 Daniel Keep wrote:
 "But the user will just assign to something useless to get around 
 that!"

 You mean like how everyone wraps every call in try{...}catch(Exception
 e){} to shut the damn exceptions up?

They do just that in Java because of the checked-exceptions thing. I have a reference to Bruce Eckel's essay on it somewhere in this thread. The observation in the article was it wasn't just moron idiot programmers doing this. It was the guru programmers doing it, all the while knowing it was the wrong thing to do. The end result was the feature actively created the very problems it was designed to prevent.
 Or uses pointer arithmetic and
 casts to get at those pesky private members?

That's entirely different, because privacy is selected by the programmer, not the language. I don't have any issue with a user-defined type that is non-nullable (Andrei has designed a type constructor for that).
 If someone is actively trying to break the type system, it's their
 goddamn fault!  Honestly, I don't care about the hacks they employ to
 defeat the system because they're going to go around blindly shooting
 themselves in the foot no matter what they do.

True, but it's still not a good idea to design a language feature that winds up, in reality, encouraging bad programming practice. It encourages bad practice in a way that is really, really hard to detect in a code review. I like programming mistakes to be obvious, not subtle. There's nothing subtle about a null pointer exception. There's plenty subtle about the wrong default value.
 And what about the people who AREN'T complete idiots, who maybe
 sometimes just accidentally trip and would quite welcome a safety rail
 there?

Null pointer seg faults *are* a safety rail. They keep an errant program from causing further damage.

Null pointer seg faults *not being able to happen* are much more safe. :)

There is no such thing as "not being able to happen" :) Object thisCannotPossiblyBeNullInAnyWayWhatsoever = cast(Object)null;

Object is not-nullable, Object? (or whatever syntax you like) is nullable. So that line is a compile-time error: you can't cast a null to an Object (because Object *can't* be null). You might be the only one here that understands Walter's point. But Walter is wrong. ;-)
Sep 26 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Ary Borenszweig wrote:
 Jeremie Pelletier wrote:
 Ary Borenszweig wrote:
 Walter Bright wrote:
 Daniel Keep wrote:
 "But the user will just assign to something useless to get around 
 that!"

 You mean like how everyone wraps every call in try{...}catch(Exception
 e){} to shut the damn exceptions up?

They do just that in Java because of the checked-exceptions thing. I have a reference to Bruce Eckel's essay on it somewhere in this thread. The observation in the article was it wasn't just moron idiot programmers doing this. It was the guru programmers doing it, all the while knowing it was the wrong thing to do. The end result was the feature actively created the very problems it was designed to prevent.
 Or uses pointer arithmetic and
 casts to get at those pesky private members?

That's entirely different, because privacy is selected by the programmer, not the language. I don't have any issue with a user-defined type that is non-nullable (Andrei has designed a type constructor for that).
 If someone is actively trying to break the type system, it's their
 goddamn fault!  Honestly, I don't care about the hacks they employ to
 defeat the system because they're going to go around blindly shooting
 themselves in the foot no matter what they do.

True, but it's still not a good idea to design a language feature that winds up, in reality, encouraging bad programming practice. It encourages bad practice in a way that is really, really hard to detect in a code review. I like programming mistakes to be obvious, not subtle. There's nothing subtle about a null pointer exception. There's plenty subtle about the wrong default value.
 And what about the people who AREN'T complete idiots, who maybe
 sometimes just accidentally trip and would quite welcome a safety rail
 there?

Null pointer seg faults *are* a safety rail. They keep an errant program from causing further damage.

Null pointer seg faults *not being able to happen* are much more safe. :)

There is no such thing as "not being able to happen" :) Object thisCannotPossiblyBeNullInAnyWayWhatsoever = cast(Object)null;

Object is not-nullable, Object? (or whatever syntax you like) is nullable. So that line is a compile-time error: you can't cast a null to an Object (because Object *can't* be null). You might be the only one here that understands Walter's point. But Walter is wrong. ;-)

union A { Object foo; Object? bar; } Give me a type system, and I will find backdoors :) I didn't say Walter was right or wrong, I said I understand his point of view. The sweet spot most likely lie in the middle of both arguments seen in this thread, and that's not an easy one to pinpoint! I think we should much rather enforce variable initialization in D than nullable/non-nullable types. The error after all is that an unitialized reference triggers a segfault. What if using 'Object obj;' raises a warning "unitialized variable" and makes everyone wanting non-null references happy, and 'Object obj = null;' raises no warning and makes everyone wanting to keep the current system (all two of us!) happy. I believe it's a fair compromise.
Sep 26 2009
next sibling parent Ary Borenszweig <ary esperanto.org.ar> writes:
Jeremie Pelletier wrote:
 Ary Borenszweig wrote:
 Jeremie Pelletier wrote:
 Ary Borenszweig wrote:
 Walter Bright wrote:
 Daniel Keep wrote:
 "But the user will just assign to something useless to get around 
 that!"

 You mean like how everyone wraps every call in 
 try{...}catch(Exception
 e){} to shut the damn exceptions up?

They do just that in Java because of the checked-exceptions thing. I have a reference to Bruce Eckel's essay on it somewhere in this thread. The observation in the article was it wasn't just moron idiot programmers doing this. It was the guru programmers doing it, all the while knowing it was the wrong thing to do. The end result was the feature actively created the very problems it was designed to prevent.
 Or uses pointer arithmetic and
 casts to get at those pesky private members?

That's entirely different, because privacy is selected by the programmer, not the language. I don't have any issue with a user-defined type that is non-nullable (Andrei has designed a type constructor for that).
 If someone is actively trying to break the type system, it's their
 goddamn fault!  Honestly, I don't care about the hacks they employ to
 defeat the system because they're going to go around blindly shooting
 themselves in the foot no matter what they do.

True, but it's still not a good idea to design a language feature that winds up, in reality, encouraging bad programming practice. It encourages bad practice in a way that is really, really hard to detect in a code review. I like programming mistakes to be obvious, not subtle. There's nothing subtle about a null pointer exception. There's plenty subtle about the wrong default value.
 And what about the people who AREN'T complete idiots, who maybe
 sometimes just accidentally trip and would quite welcome a safety 
 rail
 there?

Null pointer seg faults *are* a safety rail. They keep an errant program from causing further damage.

Null pointer seg faults *not being able to happen* are much more safe. :)

There is no such thing as "not being able to happen" :) Object thisCannotPossiblyBeNullInAnyWayWhatsoever = cast(Object)null;

Object is not-nullable, Object? (or whatever syntax you like) is nullable. So that line is a compile-time error: you can't cast a null to an Object (because Object *can't* be null). You might be the only one here that understands Walter's point. But Walter is wrong. ;-)

union A { Object foo; Object? bar; } Give me a type system, and I will find backdoors :)

Ah, nice one. Well, I see you can always break the type system. The point is to break it as little as possible while obtaining the most out of it without it bothering you.
Sep 26 2009
prev sibling next sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Jeremie Pelletier wrote:
 What if using 'Object obj;' raises a warning "unitialized variable" and 
 makes everyone wanting non-null references happy, and 'Object obj = 
 null;' raises no warning and makes everyone wanting to keep the current 
 system (all two of us!) happy.
 
 I believe it's a fair compromise.

It's a large improvement, but only for local variables. If your segfault has to do with a local variable, unless your function is monstrously large, it should be easy to fix, without changing the type system. The larger use case is when you have an aggregate member that cannot be null. This can be solved via contracts, but they are tedious to write and ubiquitous.
Sep 26 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Christopher Wright wrote:
 Jeremie Pelletier wrote:
 What if using 'Object obj;' raises a warning "unitialized variable" 
 and makes everyone wanting non-null references happy, and 'Object obj 
 = null;' raises no warning and makes everyone wanting to keep the 
 current system (all two of us!) happy.

 I believe it's a fair compromise.

It's a large improvement, but only for local variables. If your segfault has to do with a local variable, unless your function is monstrously large, it should be easy to fix, without changing the type system. The larger use case is when you have an aggregate member that cannot be null. This can be solved via contracts, but they are tedious to write and ubiquitous.

But how would you enforce a nonnull type over an aggregate in the first place? If you can, you could also apply the same initializer semantics I suggested earlier. Look at this for example: struct A { Object cannotBeNull; } void main() { A* a = new A; } Memory gets initialized to zero, and you have a broken non-null type. You could have the compiler throw an error here, but the compiler cannot possibly know about all data creation methods such as malloc, calloc or any other external allocator. You could even do something like: Object* foo = calloc(Object.sizeof); and the compiler would let you dereference foo resulting in yet another broken nonnull variable. Non-nulls are a cute idea when you have a type system that is much stricter than D's, but there are just way too many workarounds to make it crash in D.
Sep 26 2009
parent reply downs <default_357-line yahoo.de> writes:
Jeremie Pelletier wrote:
 Christopher Wright wrote:
 Jeremie Pelletier wrote:
 What if using 'Object obj;' raises a warning "unitialized variable"
 and makes everyone wanting non-null references happy, and 'Object obj
 = null;' raises no warning and makes everyone wanting to keep the
 current system (all two of us!) happy.

 I believe it's a fair compromise.

It's a large improvement, but only for local variables. If your segfault has to do with a local variable, unless your function is monstrously large, it should be easy to fix, without changing the type system. The larger use case is when you have an aggregate member that cannot be null. This can be solved via contracts, but they are tedious to write and ubiquitous.

But how would you enforce a nonnull type over an aggregate in the first place? If you can, you could also apply the same initializer semantics I suggested earlier. Look at this for example: struct A { Object cannotBeNull; } void main() { A* a = new A; } Memory gets initialized to zero, and you have a broken non-null type. You could have the compiler throw an error here, but the compiler cannot possibly know about all data creation methods such as malloc, calloc or any other external allocator. You could even do something like: Object* foo = calloc(Object.sizeof); and the compiler would let you dereference foo resulting in yet another broken nonnull variable. Non-nulls are a cute idea when you have a type system that is much stricter than D's, but there are just way too many workarounds to make it crash in D.

"Here are some cases you haven't mentioned yet. This proves that the compiler can't possibly be smart enough. " Yeeeeeah. In the above case, why not implicitly put the cannotBeNull check into the struct invariant? That's where it belongs, imho. Regarding your example, it's calloc(size_t.sizeof). And a) we probably can't catch that case except with in/out null checks on every method, but then again, how often have you done that? I don't think it's relevant enough to be relevant to this thread. :)
Sep 27 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
downs wrote:
 Jeremie Pelletier wrote:
 Christopher Wright wrote:
 Jeremie Pelletier wrote:
 What if using 'Object obj;' raises a warning "unitialized variable"
 and makes everyone wanting non-null references happy, and 'Object obj
 = null;' raises no warning and makes everyone wanting to keep the
 current system (all two of us!) happy.

 I believe it's a fair compromise.

segfault has to do with a local variable, unless your function is monstrously large, it should be easy to fix, without changing the type system. The larger use case is when you have an aggregate member that cannot be null. This can be solved via contracts, but they are tedious to write and ubiquitous.

place? If you can, you could also apply the same initializer semantics I suggested earlier. Look at this for example: struct A { Object cannotBeNull; } void main() { A* a = new A; } Memory gets initialized to zero, and you have a broken non-null type. You could have the compiler throw an error here, but the compiler cannot possibly know about all data creation methods such as malloc, calloc or any other external allocator. You could even do something like: Object* foo = calloc(Object.sizeof); and the compiler would let you dereference foo resulting in yet another broken nonnull variable. Non-nulls are a cute idea when you have a type system that is much stricter than D's, but there are just way too many workarounds to make it crash in D.

"Here are some cases you haven't mentioned yet. This proves that the compiler can't possibly be smart enough. " Yeeeeeah.

I allocate most structs on the gc, unless I need them only for the scope of a function (that includes RVO). All objects are on the gc already, so it's a pretty major case. The argument was to protect aggregate fields, I'm just pointing out that their usage usually is preventing an easy implementation. I'm not saying its impossible. Besides, what I said was, if its possible to enforce these fields to be null/non-null, you can enforce them to be properly initialized in such case, making nulls/non-nulls nearly useless.
 In the above case, why not implicitly put the cannotBeNull check into the
struct invariant? That's where it belongs, imho.

Exactly, what's the need for null/non-null types then?
 Regarding your example, it's calloc(size_t.sizeof). And a) we probably can't
catch that case except with in/out null checks on every method, but then again,
how often have you done that? I don't think it's relevant enough to be relevant
to this thread. :)

Actually, sizeof currently returns the size of the reference, so its always going to be the same as size_t.sizeof.
Sep 27 2009
parent downs <default_357-line yahoo.de> writes:
Jeremie Pelletier wrote:
 downs wrote:
 Jeremie Pelletier wrote:
 Christopher Wright wrote:
 Jeremie Pelletier wrote:
 What if using 'Object obj;' raises a warning "unitialized variable"
 and makes everyone wanting non-null references happy, and 'Object obj
 = null;' raises no warning and makes everyone wanting to keep the
 current system (all two of us!) happy.

 I believe it's a fair compromise.

segfault has to do with a local variable, unless your function is monstrously large, it should be easy to fix, without changing the type system. The larger use case is when you have an aggregate member that cannot be null. This can be solved via contracts, but they are tedious to write and ubiquitous.

place? If you can, you could also apply the same initializer semantics I suggested earlier. Look at this for example: struct A { Object cannotBeNull; } void main() { A* a = new A; } Memory gets initialized to zero, and you have a broken non-null type. You could have the compiler throw an error here, but the compiler cannot possibly know about all data creation methods such as malloc, calloc or any other external allocator. You could even do something like: Object* foo = calloc(Object.sizeof); and the compiler would let you dereference foo resulting in yet another broken nonnull variable. Non-nulls are a cute idea when you have a type system that is much stricter than D's, but there are just way too many workarounds to make it crash in D.

"Here are some cases you haven't mentioned yet. This proves that the compiler can't possibly be smart enough. " Yeeeeeah.

I allocate most structs on the gc, unless I need them only for the scope of a function (that includes RVO). All objects are on the gc already, so it's a pretty major case. The argument was to protect aggregate fields, I'm just pointing out that their usage usually is preventing an easy implementation. I'm not saying its impossible. Besides, what I said was, if its possible to enforce these fields to be null/non-null, you can enforce them to be properly initialized in such case, making nulls/non-nulls nearly useless.
 In the above case, why not implicitly put the cannotBeNull check into
 the struct invariant? That's where it belongs, imho.

Exactly, what's the need for null/non-null types then?

You're twisting my words. Checking for null in the struct invariant would be an _implementation_ of non-nullable types in structs. Isn't the whole point of defaulting to non-nullable types that we don't have to check for it manually, i.e. in the user-defined invariant? I think we should avoid having to build recursive checks for null-ness for every type we define.
 Regarding your example, it's calloc(size_t.sizeof). And a) we probably
 can't catch that case except with in/out null checks on every method,
 but then again, how often have you done that? I don't think it's
 relevant enough to be relevant to this thread. :)

Actually, sizeof currently returns the size of the reference, so its always going to be the same as size_t.sizeof.

Weird. I remembered that differently. Thanks.
Sep 27 2009
prev sibling parent reply "Nick Sabalausky" <a a.a> writes:
"Jeremie Pelletier" <jeremiep gmail.com> wrote in message 
news:h9mmre$1i8j$1 digitalmars.com...
 Ary Borenszweig wrote:
 Object is not-nullable, Object? (or whatever syntax you like) is 
 nullable. So that line is a compile-time error: you can't cast a null to 
 an Object (because Object *can't* be null).

union A { Object foo; Object? bar; } Give me a type system, and I will find backdoors :)

Unions are nothing more than an alternate syntax for a reinterpret cast. And it's an arguably worse syntax because unlike casts, uses of it are indistinguishable from normal safe code, there's nothing to grep for. As such, unions should never be considered any more safe than cast(x)y. The following is just as dangerous as your example above and doesn't even touch the issue of nullability/non-nulability: union A { int foo; float bar; }
Sep 28 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Nick Sabalausky wrote:
 "Jeremie Pelletier" <jeremiep gmail.com> wrote in message 
 news:h9mmre$1i8j$1 digitalmars.com...
 Ary Borenszweig wrote:
 Object is not-nullable, Object? (or whatever syntax you like) is 
 nullable. So that line is a compile-time error: you can't cast a null to 
 an Object (because Object *can't* be null).

Object foo; Object? bar; } Give me a type system, and I will find backdoors :)

Unions are nothing more than an alternate syntax for a reinterpret cast. And it's an arguably worse syntax because unlike casts, uses of it are indistinguishable from normal safe code, there's nothing to grep for. As such, unions should never be considered any more safe than cast(x)y. The following is just as dangerous as your example above and doesn't even touch the issue of nullability/non-nulability: union A { int foo; float bar; }

Yet it's the only way I know of to do bitwise logic on floating points in D to extract the exponent, sign and mantissa for example. And yes they are much, much more than a simple reinterpret cast, a simple set of casts will not set the size of the union to its largest member. Unions make for elegant types which can have many valid representations: union Vec3F { struct { float x, y, z; } float[3] v; } I just can't picture D without unions :)
Sep 28 2009
next sibling parent reply Jari-Matti =?UTF-8?B?TcOka2Vsw6Q=?= <jmjmak utu.fi.invalid> writes:
Jeremie Pelletier wrote:

 Nick Sabalausky wrote:
 union A {
 int foo;
 float bar;
 }
 

Yet it's the only way I know of to do bitwise logic on floating points in D to extract the exponent, sign and mantissa for example.

You could add built-in methods for those operations to the float type: float bar; boolean s = bar.sign; ... Union is very flexible, but unfortunately it's also one of the features that can break the type safety in D.
Sep 28 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Jari-Matti Mäkelä wrote:
 Jeremie Pelletier wrote:
 
 Nick Sabalausky wrote:
 union A {
 int foo;
 float bar;
 }

in D to extract the exponent, sign and mantissa for example.

You could add built-in methods for those operations to the float type: float bar; boolean s = bar.sign; ...

That would be so inefficient in some cases, you don't always want to shift the data like bar.sign implies.
 Union is very flexible, but unfortunately it's also one of the features that 
 can break the type safety in D.

That's the best thing about systems languages: to have a core set of rules, and to be able to purposely break them. Even better, you still pass go and still get $200. I don't want a language that takes me by the hand for a walk in the park. I want a language that keeps me on my toes and punch me in the face every now and then :)
Sep 28 2009
parent reply Jari-Matti =?UTF-8?B?TcOka2Vsw6Q=?= <jmjmak utu.fi.invalid> writes:
Jeremie Pelletier wrote:

 Jari-Matti Mäkelä wrote:
 Jeremie Pelletier wrote:
 
 Nick Sabalausky wrote:
 union A {
 int foo;
 float bar;
 }

in D to extract the exponent, sign and mantissa for example.

You could add built-in methods for those operations to the float type: float bar; boolean s = bar.sign; ...

That would be so inefficient in some cases, you don't always want to shift the data like bar.sign implies.

It depends on the boolean representation. I see no reason why a built-in feature should be slower than some bitwise logic operation in user code. After all, the set of operations the language provides for the user is a subset of all possible operations the language implementation can do.
Sep 28 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Jari-Matti M.:

 It depends on the boolean representation. I see no reason why a built-in 
 feature should be slower than some bitwise logic operation in user code. 
 After all, the set of operations the language provides for the user is a 
 subset of all possible operations the language implementation can do.

I agree. One of the best qualities of C++ is that it often allows the programmers to build abstractions with no or minimal cost. A good systems language is a language that allows you to define a built-in looking syntactic construct (for example a function) that for example allows you to access and use parts of a floating point number with the same efficiency of C/asm code. Bye, bearophile
Sep 28 2009
prev sibling parent reply Yigal Chripun <yigal100 gmail.com> writes:
On 28/09/2009 12:05, Jeremie Pelletier wrote:
 Nick Sabalausky wrote:
 "Jeremie Pelletier" <jeremiep gmail.com> wrote in message
 news:h9mmre$1i8j$1 digitalmars.com...
 Ary Borenszweig wrote:
 Object is not-nullable, Object? (or whatever syntax you like) is
 nullable. So that line is a compile-time error: you can't cast a
 null to an Object (because Object *can't* be null).

Object foo; Object? bar; } Give me a type system, and I will find backdoors :)

Unions are nothing more than an alternate syntax for a reinterpret cast. And it's an arguably worse syntax because unlike casts, uses of it are indistinguishable from normal safe code, there's nothing to grep for. As such, unions should never be considered any more safe than cast(x)y. The following is just as dangerous as your example above and doesn't even touch the issue of nullability/non-nulability: union A { int foo; float bar; }

Yet it's the only way I know of to do bitwise logic on floating points in D to extract the exponent, sign and mantissa for example. And yes they are much, much more than a simple reinterpret cast, a simple set of casts will not set the size of the union to its largest member. Unions make for elegant types which can have many valid representations: union Vec3F { struct { float x, y, z; } float[3] v; } I just can't picture D without unions :)

here's a type-safe alternative note: untested struct Vec3F { float[3] v; alias v[0] x; alias v[1] y; alias v[2] z; } D provides alignment control for structs, why do we need to have a separate union construct if it is just a special case of struct alignment? IMO the use cases for union are very rare and they all can be redesigned in a type safe manner. when software was small and simple, hand tuning code with low level mechanisms (such as unions and even using assembly) made a lot of sense. Today's software is typically far more complex and is way to big to risk loosing safety features for marginal performance gains. micro optimizations simply doesn't scale.
Sep 28 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Yigal Chripun wrote:
 On 28/09/2009 12:05, Jeremie Pelletier wrote:
 Nick Sabalausky wrote:
 "Jeremie Pelletier" <jeremiep gmail.com> wrote in message
 news:h9mmre$1i8j$1 digitalmars.com...
 Ary Borenszweig wrote:
 Object is not-nullable, Object? (or whatever syntax you like) is
 nullable. So that line is a compile-time error: you can't cast a
 null to an Object (because Object *can't* be null).

Object foo; Object? bar; } Give me a type system, and I will find backdoors :)

Unions are nothing more than an alternate syntax for a reinterpret cast. And it's an arguably worse syntax because unlike casts, uses of it are indistinguishable from normal safe code, there's nothing to grep for. As such, unions should never be considered any more safe than cast(x)y. The following is just as dangerous as your example above and doesn't even touch the issue of nullability/non-nulability: union A { int foo; float bar; }

Yet it's the only way I know of to do bitwise logic on floating points in D to extract the exponent, sign and mantissa for example. And yes they are much, much more than a simple reinterpret cast, a simple set of casts will not set the size of the union to its largest member. Unions make for elegant types which can have many valid representations: union Vec3F { struct { float x, y, z; } float[3] v; } I just can't picture D without unions :)

here's a type-safe alternative note: untested struct Vec3F { float[3] v; alias v[0] x; alias v[1] y; alias v[2] z; } D provides alignment control for structs, why do we need to have a separate union construct if it is just a special case of struct alignment?

These aliases won't compile, and that was only one out of many union uses.
 IMO the use cases for union are very rare and they all can be redesigned 
 in a type safe manner.

Not always true.
 when software was small and simple, hand tuning code with low level 
 mechanisms (such as unions and even using assembly) made a lot of sense. 
 Today's software is typically far more complex and is way to big to risk 
 loosing safety features for marginal performance gains.
 
 micro optimizations simply doesn't scale.

Again, that's a lazy view on programming. High level constructs are useful to isolate small and simple algorithms which are implemented at low level. These aren't just marginal performance gains, they can easily be up to 15-30% improvements, sometimes 50% and more. If this is too complex or the risk is too high for you then don't use a systems language :)
Sep 28 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Jeremie Pelletier:

 Not always true.

I agree, I'm using D also because it offers unions. Sometimes they are useful. But beside normal C unions that I don't want to remove from C, it can be also useful to have safe automatic tagged unions of Cyclone. They are safer and give just a little less performance compared to C unions. In D they may be denoted with "record" or "tunion" or just "safe union" to save keywords. They always contain an invisible tag (that can be read with a special built-in union method, like Unioname.tagcheck). Such "safe unions" may even become the only ones allowed in SafeD modules! The following is from Cyclone docs: << The C Standard says that if you read out any member of a union other than the last one written, the result is undefined. To avoid this problem, Cyclone provides a built-in form of tagged union and always ensures that the tag is correlated with the last member written in the union. In particular, whenever a tagged union member is updated, the compiler inserts code to update the tag associated with the union. Whenever a member is read, the tag is consulted to ensure that the member was the last one written. If not, an exception is thrown. Thus, the aforementioned example can be rewritten in Cyclone like this: tagged union U { int i; int *p; }; void pr(union U x) { if (tagcheck(x.i)) printf("int(%d)",x.i); else printf("ptr(%d)",*x.p); } The tagged qualifier indicates to the compiler that U should be a tagged union. The operation tagcheck(x.i) returns true when i was the last member written so it can be used to extract the value.


 Again, that's a lazy view on programming. High level constructs are 
 useful to isolate small and simple algorithms which are implemented at 
 low level.

Software is inherently multi-scale. Probably in 90-95% of the code of a program micro-optimizations aren't that necessary because those operations are done only once in a while. But then it often happens that certain loops are done an enormous amount of times, so even small inefficiencies inside them lead to low performance. That's why profiling helps. This can be seen by how HotSpot (and modern dynamic language JITters work): usually virtual calls like you can find in a D program are quick, they don't slow down code. Yet if a dynamic call prevents the compile to perform a critical inlining or such dynamic call is left in the middle of a critical code, it may lead to a slower program. That's why I have Java code go 10-30% faster than D code compiled with LDC, not because of the GC and memory allocations, but just because LDC isn't smart enough to inline certain virtual methods. ------------------------------------ More quotations from the Cyclone documentation:
In contrast, Cyclone's analysis extends to struct, union members, and pointer
contents to ensure everything is initialized before it is used. This has two
benefits: First, we tend to catch more bugs this way, and second, programmers
don't pay for the overhead of automatic initialization on top of their own
initialization code.<

This is right on-topic:
This requires little effort from the programmer, but the NULL checks slow down
getc. To repair this, we have extended Cyclone with a new kind of pointer,
called a “never-NULL” pointer, and indicated with ‘ ’ instead of ‘*’. For
example, in Cyclone you can declare

indicating that getc expects a non-NULL FILE pointer as its argument. This one-character change tells Cyclone that it does not need to insert NULL checks into the body of getc. If getc is called with a possibly-NULL pointer, Cyclone will insert a NULL check at the call :<
Goto C's goto statements can lead to safety violations when they are used to
jump into scopes. Here is a simple example:

int z; { int x = 0xBAD; goto L; } { int *y = &z; L: *y = 3; // Possible segfault } Cyclone's static analysis detects this situation and signals an error. A goto that does not enter a scope is safe, and is allowed in Cyclone. We apply the same analysis to switch statements, which suffer from a similar vulnerability in C.< Bye, bearophile
Sep 28 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
bearophile wrote:
 Jeremie Pelletier:
 Again, that's a lazy view on programming. High level constructs are 
 useful to isolate small and simple algorithms which are implemented at 
 low level.

Software is inherently multi-scale. Probably in 90-95% of the code of a program micro-optimizations aren't that necessary because those operations are done only once in a while. But then it often happens that certain loops are done an enormous amount of times, so even small inefficiencies inside them lead to low performance. That's why profiling helps. This can be seen by how HotSpot (and modern dynamic language JITters work): usually virtual calls like you can find in a D program are quick, they don't slow down code. Yet if a dynamic call prevents the compile to perform a critical inlining or such dynamic call is left in the middle of a critical code, it may lead to a slower program. That's why I have Java code go 10-30% faster than D code compiled with LDC, not because of the GC and memory allocations, but just because LDC isn't smart enough to inline certain virtual methods.

Certainly agreed on virtual calls: on my machine, I timed a simple example as executing 65 interface calls per microsecond, 85 virtual calls per microsecond, and 210 non-member function calls per microsecond. So you should almost never worry about the cost of interface calls since they're so cheap, but they are 3.5 times slower than non-member functions. In most cases, the body of a method is a lot more expensive than the method call, so even when optimizing, it won't often benefit you to use free functions rather than class or interface methods.
Sep 28 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Christopher Wright wrote:
 bearophile wrote:
 Jeremie Pelletier:
 Again, that's a lazy view on programming. High level constructs are 
 useful to isolate small and simple algorithms which are implemented 
 at low level.

Software is inherently multi-scale. Probably in 90-95% of the code of a program micro-optimizations aren't that necessary because those operations are done only once in a while. But then it often happens that certain loops are done an enormous amount of times, so even small inefficiencies inside them lead to low performance. That's why profiling helps. This can be seen by how HotSpot (and modern dynamic language JITters work): usually virtual calls like you can find in a D program are quick, they don't slow down code. Yet if a dynamic call prevents the compile to perform a critical inlining or such dynamic call is left in the middle of a critical code, it may lead to a slower program. That's why I have Java code go 10-30% faster than D code compiled with LDC, not because of the GC and memory allocations, but just because LDC isn't smart enough to inline certain virtual methods.

Certainly agreed on virtual calls: on my machine, I timed a simple example as executing 65 interface calls per microsecond, 85 virtual calls per microsecond, and 210 non-member function calls per microsecond. So you should almost never worry about the cost of interface calls since they're so cheap, but they are 3.5 times slower than non-member functions.

Thanks for posting these interesting numbers. I seem to recall that interface dispach in D does a linear search in the interfaces list, so you may want to repeat your tests with a variable number of interfaces, and a variable position of the interface being used. Andrei
Sep 28 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Christopher Wright:

 Certainly agreed on virtual calls: on my machine, I timed a simple 
 example as executing 65 interface calls per microsecond, 85 virtual 
 calls per microsecond, and 210 non-member function calls per 
 microsecond. So you should almost never worry about the cost of 
 interface calls since they're so cheap, but they are 3.5 times slower 
 than non-member functions.


The main problem of virtual calls in D are the missed inlining opportunities. ------------ Andrei Alexandrescu:
 I seem to recall that 
 interface dispach in D does a linear search in the interfaces list, so 
 you may want to repeat your tests with a variable number of interfaces, 
 and a variable position of the interface being used.

The following is a D port of the well known "Richards" benchmark. This specific version is object oriented, its classes are final (otherwise the code gets quite slower with LDC) and it has getters/setters. It contains an interface: http://codepad.org/kO3MJK60 You can run it at the command line giving it 10000000. On a Celeron 2 GHz if you replace the interface with an abstract class the running time goes from 2.16 to 1.58 seconds, compiled with: ldc -O5 -release -inline Compiled with DMD the running time seems about unchanged. I have no idea why. Maybe some of you can tell me. In a day or two I'll release many more timings and tests about this Richards benchmark. Bye, bearophile
Sep 28 2009
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-09-28 15:36:05 -0400, bearophile <bearophileHUGS lycos.com> said:

 Compiled with DMD the running time seems about unchanged. I have no 
 idea why. Maybe some of you can tell me.

If I recall correctly, implementing an interface adds a variable to an class which contains a pointer to that interface's vtable implementation for that particular class. An interface pointer points to that variable inside the object instead (not at the beginning of the object allocated space), and calling a function on it involves dereferencing the interface's vtable, and calling the right function. Obtaining the real "this" pointer for calling the function involves looking at the first value in the interface's vtable which contains an offset you can substract from the interface pointer to get the object pointer. So basically, if I recall well how it works, calling a function on an interface reference involves one more substraction than calling a member function a class reference, which is pretty marginal. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Sep 28 2009
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 Thanks for posting these interesting numbers. I seem to recall that 
 interface dispach in D does a linear search in the interfaces list, so 
 you may want to repeat your tests with a variable number of interfaces, 
 and a variable position of the interface being used.

No, it is done with one indirection. interface IA { void foo(); } interface IB : IA { } class C : IA { void foo() { } } void test(C c) { c.foo(); } ======================================== test: enter 4,0 mov ECX,[EAX] call dword ptr 014h[ECX] leave ret
Sep 28 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

No, it is done with one indirection.<

If even Andrei, a quite intelligent person that has written big books on C++, may be wrong on such a basic thing, then I think there's a problem. It can be good to create an html page that explains how some basic things of D are implemented in the front-end. Such page can also contain box & arrow images that show how structures and memory are organized for various of such data structures. Such html page is useful for both normal programmers that want to understand what's under the hood, and for people that may want to fix/modify the front-end. Bye, bearophile
Sep 29 2009
next sibling parent Jeremie Pelletier <jeremiep gmail.com> writes:
bearophile wrote:
 Walter Bright:
 
 No, it is done with one indirection.<

If even Andrei, a quite intelligent person that has written big books on C++, may be wrong on such a basic thing, then I think there's a problem. It can be good to create an html page that explains how some basic things of D are implemented in the front-end. Such page can also contain box & arrow images that show how structures and memory are organized for various of such data structures. Such html page is useful for both normal programmers that want to understand what's under the hood, and for people that may want to fix/modify the front-end. Bye, bearophile

I agree, the ABI documentation on digitalmars.com is far from complete, I had to learn a lot of it through trial and error. What was especially confusing was the interface reference vs the interface info vs the interface's classinfo vs the referenced object, I wrote an internal wrapper struct to make most of the casts go away: struct Interface { Object object() const { return cast(Object)(cast(void*)&this - interfaceinfo.offset); } immutable(InterfaceInfo)* interfaceinfo() const { return **cast(InterfaceInfo***)&this; } immutable(ClassInfo) classinfo() const { return interfaceinfo.classinfo; } } immutable struct InterfaceInfo { ClassInfo classinfo; void*[] vtbl; ptrdiff_t offset; } These two made implementing D internals a whole lot easier! I think only InterfaceInfo is in druntime (and its confusingly named Interface in there).
Sep 29 2009
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 If even Andrei, a quite intelligent person that has written big books
 on C++, may be wrong on such a basic thing, then I think there's a
 problem.

Not everyone is an expert on everything, and how vptrs and vtbl[]s and casting actually work for multiple inheritance is far from being a basic thing. Furthermore, different compilers implement these things differently. Last I heard, Java did it the way Andrei described. Don Clugston wrote an article a few years ago on this, and found a wide variety of implementation strategies. The Digital Mars one was the fastest <g>.
Sep 29 2009
parent Don <nospam nospam.com> writes:
Dejan Lekic wrote:
 Walter, is that article publicly available?

http://www.codeproject.com/KB/cpp/FastDelegate.aspx
Oct 02 2009
prev sibling parent reply "Saaa" <empty needmail.com> writes:
bearophile wrote
 Walter Bright:

No, it is done with one indirection.<

If even Andrei, a quite intelligent person that has written big books on C++, may be wrong on such a basic thing, then I think there's a problem. It can be good to create an html page that explains how some basic things of D are implemented in the front-end. Such page can also contain box & arrow images that show how structures and memory are organized for various of such data structures. Such html page is useful for both normal programmers that want to understand what's under the hood, and for people that may want to fix/modify the front-end.

?:) I seem to have requested the thing you here ask for. (within 24 hours even) http://d.puremagic.com/issues/show_bug.cgi?id=3351
Sep 30 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Saaa wrote:
 bearophile wrote
 Walter Bright:

 No, it is done with one indirection.<

C++, may be wrong on such a basic thing, then I think there's a problem. It can be good to create an html page that explains how some basic things of D are implemented in the front-end. Such page can also contain box & arrow images that show how structures and memory are organized for various of such data structures. Such html page is useful for both normal programmers that want to understand what's under the hood, and for people that may want to fix/modify the front-end.

?:) I seem to have requested the thing you here ask for. (within 24 hours even) http://d.puremagic.com/issues/show_bug.cgi?id=3351

I wonder whether this would be a good topic for TDPL. Currently I'm thinking it's too low-level. I do plan to insert a short section about implementation, just not go deep inside the object model. Andrei
Sep 30 2009
next sibling parent Jeremie Pelletier <jeremiep gmail.com> writes:
Andrei Alexandrescu wrote:
 Saaa wrote:
 bearophile wrote
 Walter Bright:

 No, it is done with one indirection.<

on C++, may be wrong on such a basic thing, then I think there's a problem. It can be good to create an html page that explains how some basic things of D are implemented in the front-end. Such page can also contain box & arrow images that show how structures and memory are organized for various of such data structures. Such html page is useful for both normal programmers that want to understand what's under the hood, and for people that may want to fix/modify the front-end.

?:) I seem to have requested the thing you here ask for. (within 24 hours even) http://d.puremagic.com/issues/show_bug.cgi?id=3351

I wonder whether this would be a good topic for TDPL. Currently I'm thinking it's too low-level. I do plan to insert a short section about implementation, just not go deep inside the object model. Andrei

Maybe that's a topic for an appendix of the book. It is really useful to know the internals of a language, even if you don't directly use them it can impact design choices. Right now the best way to learn these internals is still to go hack and slash with the compiler's runtime implementation. Besides, there is no such thing as too low-level :)
Sep 30 2009
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 I wonder whether this would be a good topic for TDPL. Currently I'm 
 thinking it's too low-level. I do plan to insert a short section about 
 implementation, just not go deep inside the object model.

It's a very good topic for the book. Any good book about computer languages teaches not just a language, but also good programming practices and some general computer science too. In a big book about a system language I want to see "under the cover" topics too, otherwise I'll need to buy another book to learn them :-) So it's good for a book about a system language to explain how some parts of the compiler are implemented, because such parts are code too (and the level of such code can be the same, if someday will translate the D front-end to D). For example I have appreciated the chapter about Python Dict implementation in a chapter of "Beautiful code". I think you aren't interested in my help any more, but I hope you will follow this suggestion of mine (I'll buy your book anyway, but I know what I'd like to find in it). On the other hand writing about topics you don't know enough about may be negative, in such situation avoiding the topic may be better. Bye, bearophile
Sep 30 2009
prev sibling parent reply "Saaa" <empty needmail.com> writes:
Andrei Alexandrescu wrote
 I wonder whether this would be a good topic for TDPL. Currently I'm 
 thinking it's too low-level. I do plan to insert a short section about 
 implementation, just not go deep inside the object model.

 Andrei

I'd really love to see more about implementations as it makes me twitch to use something I don't really know the impact of. As for using diagrams and other visual presentations: Please use them as much as possible; e.g. Pointers without arrows is like a film without moving pictures :)
Sep 30 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Saaa wrote:
 Andrei Alexandrescu wrote
 I wonder whether this would be a good topic for TDPL. Currently I'm 
 thinking it's too low-level. I do plan to insert a short section about 
 implementation, just not go deep inside the object model.

 Andrei

I'd really love to see more about implementations as it makes me twitch to use something I don't really know the impact of. As for using diagrams and other visual presentations: Please use them as much as possible; e.g. Pointers without arrows is like a film without moving pictures :)

I do have the clasic arrow drawings that illustrate how reference semantics works, but by and large I'm not talented with drawing. If anyone in this group has such an inclination and would want to collaborate with me on the book, let me know. Send me your portfolio :o). Andrei
Sep 30 2009
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Andrei Alexandrescu wrote:
 I seem to recall that 
 interface dispach in D does a linear search in the interfaces list, so 
 you may want to repeat your tests with a variable number of interfaces, 
 and a variable position of the interface being used.

Such numbers are not interesting to me. On average, each class I write implements one interface. I rarely use inheritance and interfaces in the same class. But your information is incorrect. Here's what happens: object of class A | vtable | | classinfo pointer | | methods... | fields... | interface vtable | | struct Interface* | | methods struct Interface { ptrdiff_t this_offset; ClassInfo interfaceInfo; } There are two ways to implement interface calls with this paradigm. The compiler way: interface I { void doStuff(int arg); } class A { void doStuff(int arg) { writefln("do stuff! %s", arg); } // this method actually goes into the interface vtable ReturnType!doStuff __I_doStuff(ParameterTypeTuple!doStuff args) { auto iface = cast(Interface*)this.vtable[0]; this = this + iface.this_offset; return doStuff(args); } } You can also do it with the runtime, but that's a lot harder. It would be effectively the same code.
Sep 29 2009
prev sibling parent reply Yigal Chripun <yigal100 gmail.com> writes:
On 28/09/2009 15:28, Jeremie Pelletier wrote:
 here's a type-safe alternative
 note: untested

 struct Vec3F {
 float[3] v;
 alias v[0] x;
 alias v[1] y;
 alias v[2] z;
 }

 D provides alignment control for structs, why do we need to have a
 separate union construct if it is just a special case of struct
 alignment?

These aliases won't compile, and that was only one out of many union uses.

what other use cases for unions exist that cannot be redesigned in a safer way?
 IMO the use cases for union are very rare and they all can be
 redesigned in a type safe manner.

Not always true.
 when software was small and simple, hand tuning code with low level
 mechanisms (such as unions and even using assembly) made a lot of
 sense. Today's software is typically far more complex and is way to
 big to risk loosing safety features for marginal performance gains.

 micro optimizations simply doesn't scale.

Again, that's a lazy view on programming. High level constructs are useful to isolate small and simple algorithms which are implemented at low level.

One way to define programming is "being lazy". You ask the machine to do your work since you are lazy to do it yourself. your view above about simple algorithms which are implemented at low level is exactly the place where we disagree. Have you ever heard of Stalin (i'm not talking about the dictator)? I was pointing to a trade off at play here: you can write low level hand optimized code that is hard to maintain and reason about (for example, providing formal proof of correctness). You gained some small, non scalable performance gains and lost on other fronts like proving correctness of your code. the other way would be to write high level very regular code that can be maintained, easier to reason about and leave optimization to the tools. granted, there could be some initial performance hit compared to the previous approach but this is more portable: hardware changes do not affect code, you just need to re-run the tool. new optimization techniques can be employed by running a newer version of the tool, etc. I should also note that the second approach is already applied by compilers. unless you use inline ASM, the compiler will not use the entire ASM instruction set which contains special cases for performance tuning.
 These aren't just marginal performance gains, they can easily be up to
 15-30% improvements, sometimes 50% and more. If this is too complex or
 the risk is too high for you then don't use a systems language :)

your approach makes sense if your are implementing say a calculator. It doesn't scale to larger projects. Even C++ has overhead compared to assembly yet you are writing performance critical code in c++, right? Java had a reputation of being slow yet today performance critical servers are written in Java and not in C++ in order to get faster execution.
Sep 28 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Yigal Chripun:

Have you ever heard of Stalin (i'm not talking about the dictator)?<

Stalin accepts only a certain subset of Scheme, you can't use some of the nicest things. And while ShedSkin is slow, Stalin is really slow, so slow that compiling largish programs becomes not handy (I think times like 100 seconds for 500 lines-long programs, I don't know if such timings have improved in the meantime, I hope so).
 the other way would be to write high level very regular code that can be 
 maintained, easier to reason about and leave optimization to the tools.

Life is usually a matter of finding a balance. If you care of performance you don't use Scheme, you use a handy language that doesn't force the compiler to work a LOT, for example C#. Bye, bearophile
Sep 28 2009
prev sibling parent reply Yigal Chripun <yigal100 gmail.com> writes:
On 29/09/2009 00:31, Nick Sabalausky wrote:
 "Yigal Chripun"<yigal100 gmail.com>  wrote in message
 news:h9r37i$tgl$1 digitalmars.com...
 These aren't just marginal performance gains, they can easily be up to
 15-30% improvements, sometimes 50% and more. If this is too complex or
 the risk is too high for you then don't use a systems language :)

your approach makes sense if your are implementing say a calculator. It doesn't scale to larger projects. Even C++ has overhead compared to assembly yet you are writing performance critical code in c++, right?

It's *most* important on larger projects, because it's only on big systems where small inefficiencies actually add up to a large performance drain. Try writing a competitive real-time graphics renderer or physics simulator (especially for a game console where you're severely limited in your choice of compiler - if you even have a choice), or something like Pixar's renderer without *ever* diving into asm, or at least low-level "unsafe" code. And when it inevitably hits some missing optimization in the compiler and runs like shit, try explaining to the dev lead why it's better to beg the compiler vender to add the optimization you want and wait around hoping they finally do so, instead of just throwing in that inner optimization in the meantime. You can still leave the safe/portable version in there for platforms for which you haven't provided a hand-optimization. And unless you didn't know what you were doing, that inner optimization will still be small and highly isolated. And since it's so small and isolated, not only can you still throw in tests for it, but it's not as much harder as you would think to veryify correctness. And if/when your compiler finally does get the optimization you want, you can just rip out the hand-optimization and revert back to that "safe/portable" version that you had still left in anyway as a fallback.

I think you took my post to an extreme, I actually do agree with the above description. what you just said was basically: 1. write portable/safe version 2. profile to find bottlenecks that the tools can't optimize and optimize those only while still keeping the portable version. My objection was to what i feel was Jeremie's description of writing code from the get go in low level hand optimized way instead of what you described in your own words:
 And unless you didn't know
 what you were doing, that inner optimization will still be small and highly
 isolated.

Sep 28 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Yigal Chripun wrote:
 On 29/09/2009 00:31, Nick Sabalausky wrote:
 "Yigal Chripun"<yigal100 gmail.com>  wrote in message
 news:h9r37i$tgl$1 digitalmars.com...
 These aren't just marginal performance gains, they can easily be up to
 15-30% improvements, sometimes 50% and more. If this is too complex or
 the risk is too high for you then don't use a systems language :)

your approach makes sense if your are implementing say a calculator. It doesn't scale to larger projects. Even C++ has overhead compared to assembly yet you are writing performance critical code in c++, right?

It's *most* important on larger projects, because it's only on big systems where small inefficiencies actually add up to a large performance drain. Try writing a competitive real-time graphics renderer or physics simulator (especially for a game console where you're severely limited in your choice of compiler - if you even have a choice), or something like Pixar's renderer without *ever* diving into asm, or at least low-level "unsafe" code. And when it inevitably hits some missing optimization in the compiler and runs like shit, try explaining to the dev lead why it's better to beg the compiler vender to add the optimization you want and wait around hoping they finally do so, instead of just throwing in that inner optimization in the meantime. You can still leave the safe/portable version in there for platforms for which you haven't provided a hand-optimization. And unless you didn't know what you were doing, that inner optimization will still be small and highly isolated. And since it's so small and isolated, not only can you still throw in tests for it, but it's not as much harder as you would think to veryify correctness. And if/when your compiler finally does get the optimization you want, you can just rip out the hand-optimization and revert back to that "safe/portable" version that you had still left in anyway as a fallback.

I think you took my post to an extreme, I actually do agree with the above description. what you just said was basically: 1. write portable/safe version 2. profile to find bottlenecks that the tools can't optimize and optimize those only while still keeping the portable version. My objection was to what i feel was Jeremie's description of writing code from the get go in low level hand optimized way instead of what you described in your own words:

That wasn't what I said, I don't low level hand optimize everything, I do profiling first, only a few parts *known* to me to require optimizations (ie matrix multiplication) are written in sse from the beginning with a high level fallback, there just happen to be a lot of them :) What I argued about was your view on today's software being too big and complex to bother optimize it.
 And unless you didn't know
 what you were doing, that inner optimization will still be small and 
 highly
 isolated.


Sep 29 2009
parent reply Yigal Chripun <yigal100 gmail.com> writes:
On 29/09/2009 16:41, Jeremie Pelletier wrote:

 What I argued about was your view on today's software being too big and
 complex to bother optimize it.

that is not what I said. I was saying that hand optimized code needs to be kept at minimum and only for visible bottlenecks, because the risk of introducing low-level unsafe code is bigger in more complex and bigger software.
Sep 29 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Yigal Chripun wrote:
 On 29/09/2009 16:41, Jeremie Pelletier wrote:
 
 What I argued about was your view on today's software being too big and
 complex to bother optimize it.

that is not what I said. I was saying that hand optimized code needs to be kept at minimum and only for visible bottlenecks, because the risk of introducing low-level unsafe code is bigger in more complex and bigger software.

What's wrong with taking a risk? If you know what you're doing where is the risk, and if now how will you learn? If you write your software correctly, you could add countless assembly optimizations and never compromise the security of the entire thing, because these optimizations are isolated, so if it crashes there you have only a narrow area to debug within. There are some parts where hand optimizing is almost useless, like network I/O since latency is already so high having a faster code won't make a difference. And sometimes the optimization doesn't even need assembly, it just requires using a different high level construct or a different algorithm. The first optimization is to get the most efficient data structures with the most efficient algorithms for a given task, and THEN if you can't optimize it more you dig into assembly. People seem to think assembly is something magical and incredibly hard, it's not. Jeremie
Sep 30 2009
next sibling parent reply Don <nospam nospam.com> writes:
Jeremie Pelletier wrote:
 Yigal Chripun wrote:
 On 29/09/2009 16:41, Jeremie Pelletier wrote:

 What I argued about was your view on today's software being too big and
 complex to bother optimize it.

that is not what I said. I was saying that hand optimized code needs to be kept at minimum and only for visible bottlenecks, because the risk of introducing low-level unsafe code is bigger in more complex and bigger software.

What's wrong with taking a risk? If you know what you're doing where is the risk, and if now how will you learn? If you write your software correctly, you could add countless assembly optimizations and never compromise the security of the entire thing, because these optimizations are isolated, so if it crashes there you have only a narrow area to debug within. There are some parts where hand optimizing is almost useless, like network I/O since latency is already so high having a faster code won't make a difference. And sometimes the optimization doesn't even need assembly, it just requires using a different high level construct or a different algorithm. The first optimization is to get the most efficient data structures with the most efficient algorithms for a given task, and THEN if you can't optimize it more you dig into assembly. People seem to think assembly is something magical and incredibly hard, it's not. Jeremie

Also, if you're using asm on something other than a small, simple loop, you're probably doing something badly wrong. Therefore, it should always be localised, and easy to test thoroughly. I don't think local extreme optimisation is a big risk. Greater risks come from using more complicated algorithms. Brute-force algorithms are always the easiest ones to get right <g>.
Sep 30 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Don wrote:
 Jeremie Pelletier wrote:
 Yigal Chripun wrote:
 On 29/09/2009 16:41, Jeremie Pelletier wrote:

 What I argued about was your view on today's software being too big and
 complex to bother optimize it.

that is not what I said. I was saying that hand optimized code needs to be kept at minimum and only for visible bottlenecks, because the risk of introducing low-level unsafe code is bigger in more complex and bigger software.

What's wrong with taking a risk? If you know what you're doing where is the risk, and if now how will you learn? If you write your software correctly, you could add countless assembly optimizations and never compromise the security of the entire thing, because these optimizations are isolated, so if it crashes there you have only a narrow area to debug within. There are some parts where hand optimizing is almost useless, like network I/O since latency is already so high having a faster code won't make a difference. And sometimes the optimization doesn't even need assembly, it just requires using a different high level construct or a different algorithm. The first optimization is to get the most efficient data structures with the most efficient algorithms for a given task, and THEN if you can't optimize it more you dig into assembly. People seem to think assembly is something magical and incredibly hard, it's not. Jeremie

Also, if you're using asm on something other than a small, simple loop, you're probably doing something badly wrong. Therefore, it should always be localised, and easy to test thoroughly. I don't think local extreme optimisation is a big risk.

That's also how I do it once I find the ideal algorithm, I've never had any problems or seen any risk with this technique, I did see some good performance gains however.
 Greater risks come from using more complicated algorithms. Brute-force 
 algorithms are always the easiest ones to get right <g>.

I'm not sure I agree with that. Those algorithms are pretty isolated and really easy to write unittests for so I don't see where the risk is when writing more complex algorithms, it's obviously harder, but not riskier. On the other hand, things like GUI libraries are one big package where unittests are useless most of the time, that's a much greater risk even with straightforward and trivial code. I read somewhere that the best optimizer is between your ears, I have yet to see someone or something prove that quote wrong! Besides how are you going to get comfortable with "complex" stuff if you never play with it, its really only complex when you're learning it, once it has been assimilated by the brain its become almost trivial to use.
Sep 30 2009
next sibling parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
language_fan wrote:
 Wed, 30 Sep 2009 12:05:29 -0400, Jeremie Pelletier thusly wrote:
 
 Don wrote:
 Greater risks come from using more complicated algorithms. Brute-force
 algorithms are always the easiest ones to get right <g>.

really easy to write unittests for so I don't see where the risk is when writing more complex algorithms, it's obviously harder, but not riskier.

Do you recommend writing larger algorithms like a hard real-time distributed (let's say e.g. for 100+ processes/nodes) garbage collector or even larger stuff like btrfs or ntfs file system drivers in assembly? Don't you care about portability? Of course it would be nice to provide optimal solution for each platform and for each use case, but unfortunately the TCO thinking managers do not often agree.

Why does everyone associate complexity with assembly? You can write a more complex algorithm in the same language as the original one and get quite a good performance boost (ie binary search vs walking an array). Assembly is only useful to optimize when you found the optimal algorithm and want to lower its overhead a step further. I don't recommend any language anyways, the base algorithm is often independent of its implementation language, be it implemented in C#, D or assembly its gonna do the same thing at different performance levels. For example a simple binary search is already faster in D than in say JavaScript, but its even faster in assembly than in D, that doesn't make your entire program harder to code, nor does it change the logic.
Sep 30 2009
parent Jeremie Pelletier <jeremiep gmail.com> writes:
language_fan wrote:
 Wed, 30 Sep 2009 17:05:18 -0400, Jeremie Pelletier thusly wrote:
 
 language_fan wrote:
 Wed, 30 Sep 2009 12:05:29 -0400, Jeremie Pelletier thusly wrote:

 Don wrote:
 Greater risks come from using more complicated algorithms.
 Brute-force algorithms are always the easiest ones to get right <g>.

and really easy to write unittests for so I don't see where the risk is when writing more complex algorithms, it's obviously harder, but not riskier.

distributed (let's say e.g. for 100+ processes/nodes) garbage collector or even larger stuff like btrfs or ntfs file system drivers in assembly? Don't you care about portability? Of course it would be nice to provide optimal solution for each platform and for each use case, but unfortunately the TCO thinking managers do not often agree.

more complex algorithm in the same language as the original one and get quite a good performance boost (ie binary search vs walking an array). Assembly is only useful to optimize when you found the optimal algorithm and want to lower its overhead a step further. I don't recommend any language anyways, the base algorithm is often independent of its implementation language, be it implemented in C#, D or assembly its gonna do the same thing at different performance levels. For example a simple binary search is already faster in D than in say JavaScript, but its even faster in assembly than in D, that doesn't make your entire program harder to code, nor does it change the logic.

Well I meant that we can assume the algorithm choice is already optimal. Porting the high level program to assembly tends to grow the line count quite a bit. For instance I have experience converting Java code to Scala, and C++ to Haskell. In both cases the LOC will decrease about 50-90%. If you convert things like foreach, ranges, complex expressions, lambdas, an scope() constructs to assembly, it will increase the line count at least one order of magnitude. Reading the lower level code is much harder. And you lose important safety nets like the type system.

Yeah but I don't rate my code based on the number of lines I write, but rather on how well it performs :) I usually only go into assembly after profiling, or when I know from the start its gonna be faster, such as matrix multiplication. If lines of code were more important than performance, you'd get entire OSes and all their programs written in javascript, and you'd wait 20 minutes for your computer to boot.
Sep 30 2009
prev sibling next sibling parent Don <nospam nospam.com> writes:
Jeremie Pelletier wrote:
 Don wrote:
 Jeremie Pelletier wrote:
 Yigal Chripun wrote:
 On 29/09/2009 16:41, Jeremie Pelletier wrote:

 What I argued about was your view on today's software being too big 
 and
 complex to bother optimize it.

that is not what I said. I was saying that hand optimized code needs to be kept at minimum and only for visible bottlenecks, because the risk of introducing low-level unsafe code is bigger in more complex and bigger software.

What's wrong with taking a risk? If you know what you're doing where is the risk, and if now how will you learn? If you write your software correctly, you could add countless assembly optimizations and never compromise the security of the entire thing, because these optimizations are isolated, so if it crashes there you have only a narrow area to debug within. There are some parts where hand optimizing is almost useless, like network I/O since latency is already so high having a faster code won't make a difference. And sometimes the optimization doesn't even need assembly, it just requires using a different high level construct or a different algorithm. The first optimization is to get the most efficient data structures with the most efficient algorithms for a given task, and THEN if you can't optimize it more you dig into assembly. People seem to think assembly is something magical and incredibly hard, it's not. Jeremie

Also, if you're using asm on something other than a small, simple loop, you're probably doing something badly wrong. Therefore, it should always be localised, and easy to test thoroughly. I don't think local extreme optimisation is a big risk.

That's also how I do it once I find the ideal algorithm, I've never had any problems or seen any risk with this technique, I did see some good performance gains however.
 Greater risks come from using more complicated algorithms. Brute-force 
 algorithms are always the easiest ones to get right <g>.

I'm not sure I agree with that. Those algorithms are pretty isolated and really easy to write unittests for so I don't see where the risk is when writing more complex algorithms, it's obviously harder, but not riskier.

By "riskier" I mean "more chance of containing an error". I'm partly basing this on my recent experience with writing BigInt. The low-level asm routines are easy to get right, and it's easy to tell when you've go them wrong. They do brute-force stuff, like schoolbook O(n^2) multiplication, and importantly, _there are no special cases_ because it needs to be fast. But the higher-level O(n^1.3) multiplication algorithms are full of special cases, and that's where the bugs are.
Oct 01 2009
prev sibling parent Don <nospam nospam.com> writes:
language_fan wrote:
 Wed, 30 Sep 2009 12:05:29 -0400, Jeremie Pelletier thusly wrote:
 
 Don wrote:
 Greater risks come from using more complicated algorithms. Brute-force
 algorithms are always the easiest ones to get right <g>.

really easy to write unittests for so I don't see where the risk is when writing more complex algorithms, it's obviously harder, but not riskier.

Do you recommend writing larger algorithms like a hard real-time distributed (let's say e.g. for 100+ processes/nodes) garbage collector or even larger stuff like btrfs or ntfs file system drivers in assembly? Don't you care about portability? Of course it would be nice to provide optimal solution for each platform and for each use case, but unfortunately the TCO thinking managers do not often agree.

You deal with this by ensuring that you have a clear division between "simple but needs to be as fast as possible" (which you do low-level optimisation on) and "complicated, but less speed critical". It's a classic problem of separation of concerns: you need to ensure that no piece of code has requirements to be fast AND clever at the same time. Incidentally, it's usually not possible to make something optimally fast unless it's really simple. So no, you should never do something complicated in asm.
Oct 01 2009
prev sibling parent Yigal Chripun <yigal100 gmail.com> writes:
On 30/09/2009 16:53, Jeremie Pelletier wrote:
 Yigal Chripun wrote:
 On 29/09/2009 16:41, Jeremie Pelletier wrote:

 What I argued about was your view on today's software being too big and
 complex to bother optimize it.

that is not what I said. I was saying that hand optimized code needs to be kept at minimum and only for visible bottlenecks, because the risk of introducing low-level unsafe code is bigger in more complex and bigger software.

What's wrong with taking a risk? If you know what you're doing where is the risk, and if now how will you learn? If you write your software correctly, you could add countless assembly optimizations and never compromise the security of the entire thing, because these optimizations are isolated, so if it crashes there you have only a narrow area to debug within. There are some parts where hand optimizing is almost useless, like network I/O since latency is already so high having a faster code won't make a difference. And sometimes the optimization doesn't even need assembly, it just requires using a different high level construct or a different algorithm. The first optimization is to get the most efficient data structures with the most efficient algorithms for a given task, and THEN if you can't optimize it more you dig into assembly. People seem to think assembly is something magical and incredibly hard, it's not. Jeremie

When I said optimizing, I meant lowering the implementation level by using lower level language constructs (pointers vs. references for example) and asm instead of D. Assume that the choice of algorithm and data structures is optimal. Like language_fan wrote, when you lower the level your increase your LOC and your loose all sorts of safety features. statistically speaking there's about a bug per 2000LOC on average so you also increase the chance of a bug. All that together mean a higher risk. your ASM implementation of binary search could be slightly faster than a comparable Haskel implementation, but the latter would be much easier to formally prove that it's correct. I don't know about you, but I prefer hospital equipment, airplanes, cars, etc, to be correct even if they'll be a couple percent slower.
Sep 30 2009
prev sibling parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Jarrett Billingsley wrote:
 On Sat, Sep 26, 2009 at 11:23 PM, Jeremie Pelletier <jeremiep gmail.com> wrote:
 
 There is no such thing as "not being able to happen" :)

 Object thisCannotPossiblyBeNullInAnyWayWhatsoever = cast(Object)null;

 I seem to be the only one who sees Walter's side of things in this thread
 :o)

Why the hell would the compiler allow that to begin with? Why bother implementing nonnull references only to allow the entire system to be broken?

Because D is a practical language that let the programmer do whatever he wants, even shoot his own foot if he wants to. Doing so just isn't as implicit as in C. Walter understands there are some cases where you want to override the type system, that's why casts are in D, too many optimizations rely on it.
Sep 26 2009
parent downs <default_357-line yahoo.de> writes:
Jeremie Pelletier wrote:
 Jarrett Billingsley wrote:
 On Sat, Sep 26, 2009 at 11:23 PM, Jeremie Pelletier
 <jeremiep gmail.com> wrote:

 There is no such thing as "not being able to happen" :)

 Object thisCannotPossiblyBeNullInAnyWayWhatsoever = cast(Object)null;

 I seem to be the only one who sees Walter's side of things in this
 thread
 :o)

Why the hell would the compiler allow that to begin with? Why bother implementing nonnull references only to allow the entire system to be broken?

Because D is a practical language that let the programmer do whatever he wants, even shoot his own foot if he wants to. Doing so just isn't as implicit as in C. Walter understands there are some cases where you want to override the type system, that's why casts are in D, too many optimizations rely on it.

Sure, but if you set out to break it the compiler really can't (or shouldn't) help you. This whole debate, as far as I know, is about defaults, i.e. preventing *unintentional* nulls.
Sep 27 2009
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

 The only time I've had a 
 problem finding where a null came from (because they tend to fail very 
 close to their initialization point) is when the null was caused by 
 another memory corruption problem. Non-nullable references won't 
 mitigate that.

There are some ways to reduce the number/probability of memory corruptions too in a C-like language. Memory regions, region analysis, etc. We can discuss about this too, but this is another topic. Bye, bearophile
Sep 26 2009
prev sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Walter Bright wrote:
 Daniel Keep wrote:
 "But the user will just assign to something useless to get around that!"

 You mean like how everyone wraps every call in try{...}catch(Exception
 e){} to shut the damn exceptions up?

They do just that in Java because of the checked-exceptions thing. I have a reference to Bruce Eckel's essay on it somewhere in this thread. The observation in the article was it wasn't just moron idiot programmers doing this. It was the guru programmers doing it, all the while knowing it was the wrong thing to do. The end result was the feature actively created the very problems it was designed to prevent.

Checked exceptions are a bad example: you can't not use them. No one is proposing to remove null from the language. If we WERE, you would be quite correct. But we're not. If someone doesn't want to use non-null references, then they don't use them.
 Or uses pointer arithmetic and
 casts to get at those pesky private members?

That's entirely different, because privacy is selected by the programmer, not the language. I don't have any issue with a user-defined type that is non-nullable (Andrei has designed a type constructor for that).

Good grief, that's what non-null references are! Object foo = new Object; // Dear Mr. Compiler, I would like a non-nullable // reference to an Object, please! Here's the object // I want you to use. Object? bar; // Dear Mr. Compiler, I would like a nullable reference // to an object, please! Just initialise with null, thanks. How is that not selected by the programmer? The programmer is in complete control. We are not asking for the language to unilaterally declare null to be a sin, we want to be given the choice to say we don't want it! Incidentally, on the subject of non-null as a UDT, that would be a largely acceptable solution for me. The trouble is that in order to do it, you'd need to be able to block default initialisation, **which is precisely what you're arguing against** You can't have it both ways.
 If someone is actively trying to break the type system, it's their
 goddamn fault!  Honestly, I don't care about the hacks they employ to
 defeat the system because they're going to go around blindly shooting
 themselves in the foot no matter what they do.

True, but it's still not a good idea to design a language feature that winds up, in reality, encouraging bad programming practice. It encourages bad practice in a way that is really, really hard to detect in a code review.

Whether or not it encourages it is impossible to determine at this juncture because I can't think of a language comparable to D that has it. Things that are "like" it don't count. Ignoring that, you're correct that if someone decides to abuse non-null references, it's going to be less than trivial to detect.
 I like programming mistakes to be obvious, not subtle. There's nothing
 subtle about a null pointer exception. There's plenty subtle about the
 wrong default value.

I think this is a fallacy. You're assuming a person who is actively going out of their way to misuse the type system. I'll repeat myself: Foo bar = arbitrary_default; is harder to do than Foo? bar; Which does exactly what they want: it relieves them of the need to initialise, and gives a relatively safe default value. I mean, people could abuse a lot of things in D. Pointers, certainly. DEFINITELY inline assembler. But we don't get rid of them because at some point you have to say "you know what? If you're going to play with fire, that's your own lookout." The only way you're ever going to have a language that's actually safe no matter how ignorant, stupid or just outright suicidal the programmer is would be to implement a compiler for SIMPLE: http://esoteric.voxelperfect.net/wiki/SIMPLE
 And what about the people who AREN'T complete idiots, who maybe
 sometimes just accidentally trip and would quite welcome a safety rail
 there?

Null pointer seg faults *are* a safety rail. They keep an errant program from causing further damage.

Really? " I used to work at Boeing designing critical flight systems. Absolutely the WRONG failure mode is to **pretend nothing went wrong** and happily return **default values** and show lovely green lights on the instrument panel. The right thing is to **immediately inform the pilot that something went wrong and INSTANTLY SHUT THE BAD SYSTEM DOWN** before it does something really, really bad, because now it is in an unknown state. The pilot then follows the procedure he's trained to, such as engage the backup. " Think of the compiler as the autopilot. Pretending nothing went wrong is passing a null into a function that doesn't expect it, or shoving it into a field that's not meant to be null. Null IS a happy default value that can be passed around without consequence from the type system. Immediately informing the pilot is refusing to compile because the code looks like it's doing something wrong. A NPE is the thermonuclear option of error handling. Your program blows up, tough luck, try again. Debugging is forensics, just like picking through a mound of dead bodies and bits of fuselage; if it's come to that, there's a problem. Non-nullable references are the compiler (or autopilot) putting up the red flag and saying "are you really sure you want to do this? I mean, it LOOKS wrong to me!"
 Finally, let me re-post something I wrote the last time this came up:

 The problem with null dereference problems isn't knowing that they're
 there: that's the easy part.  You helpfully get an exception to the
 face when that happens. The hard part is figuring out *where* the
 problem originally occurred. It's not when the exception is thrown
 that's the issue; it's the point at which you placed a null reference
 in a slot where you shouldn't have.


It's a lot harder to track down a bug when the bad initial value gets combined with a lot of other data first. The only time I've had a problem finding where a null came from (because they tend to fail very close to their initialization point) is when the null was caused by another memory corruption problem. Non-nullable references won't mitigate that.

Only when the nulls are assigned and used locally. I've had code before when a null accidentally snuck into an object through a constructor that was written before the field existed. The object gets passed around. No problem; it's not null. It gets stored inside other things, pulled out. The field itself is pulled out and passed around, put into other things. And THEN the program blows up. You can't run a debugger backwards through time, because that's what you need to do to figure out where the bloody thing came from. The NPE tells you there is a problem, but it doesn't tell you WHY or WHERE. It's your leg dropping off from necrosis and the doctor going "gee, I guess you're sick." It's the plane smashing into the ground and killing everyone inside, a specialised team spending a month analysing the wreckage and saying "well, this screw came loose but BUGGERED if we can work out why." Then, after several more crashes, someone finally realises that it didn't come loose, it was never there to begin with. "Oh! THAT'S why they keep crashing! "Gee, would've been nice if the plane wouldn't have taken off without it."
Sep 26 2009
parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
Daniel Keep wrote:
 
 Walter Bright wrote:
 Daniel Keep wrote:
 "But the user will just assign to something useless to get around that!"

 You mean like how everyone wraps every call in try{...}catch(Exception
 e){} to shut the damn exceptions up?

have a reference to Bruce Eckel's essay on it somewhere in this thread. The observation in the article was it wasn't just moron idiot programmers doing this. It was the guru programmers doing it, all the while knowing it was the wrong thing to do. The end result was the feature actively created the very problems it was designed to prevent.

up, tough luck, try again. Debugging is forensics, just like picking through a mound of dead bodies and bits of fuselage; if it's come to that, there's a problem. It's your leg dropping off from necrosis and the doctor going "gee, I guess you're sick." It's the plane smashing into the ground and killing everyone inside, a specialised team spending a month analysing the wreckage and saying "well, this screw came loose but BUGGERED if we can work out why." Then, after several more crashes, someone finally realises that it didn't come loose, it was never there to begin with. "Oh! THAT'S why they keep crashing! "Gee, would've been nice if the plane wouldn't have taken off without it."

I like your analogies. :)
Sep 26 2009
parent Jeremie Pelletier <jeremiep gmail.com> writes:
Ary Borenszweig wrote:
 Daniel Keep wrote:
 Walter Bright wrote:
 Daniel Keep wrote:
 "But the user will just assign to something useless to get around 
 that!"

 You mean like how everyone wraps every call in try{...}catch(Exception
 e){} to shut the damn exceptions up?

have a reference to Bruce Eckel's essay on it somewhere in this thread. The observation in the article was it wasn't just moron idiot programmers doing this. It was the guru programmers doing it, all the while knowing it was the wrong thing to do. The end result was the feature actively created the very problems it was designed to prevent.

up, tough luck, try again. Debugging is forensics, just like picking through a mound of dead bodies and bits of fuselage; if it's come to that, there's a problem. It's your leg dropping off from necrosis and the doctor going "gee, I guess you're sick." It's the plane smashing into the ground and killing everyone inside, a specialised team spending a month analysing the wreckage and saying "well, this screw came loose but BUGGERED if we can work out why." Then, after several more crashes, someone finally realises that it didn't come loose, it was never there to begin with. "Oh! THAT'S why they keep crashing! "Gee, would've been nice if the plane wouldn't have taken off without it."

I like your analogies. :)

I also do, but try and picture a plane sophisticated to the point it can notice missing screws and ask yourself the following question: what is making sure such a screw detection system works correctly. That's really just taking a problem and sending it to another team to solve, at the end of the day, it's still a problem. Besides, explosions are cool!
Sep 26 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sat, Sep 26, 2009 at 10:41 PM, Walter Bright
<newshound1 digitalmars.com> wrote:

 And what about the people who AREN'T complete idiots, who maybe
 sometimes just accidentally trip and would quite welcome a safety rail
 there?

Null pointer seg faults *are* a safety rail. They keep an errant program from causing further damage.

If you haven't crawled out from under your rock in the last twenty years, I'd like to point out that the accepted definition of safety and program correctness has changed since then.
Sep 26 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sat, Sep 26, 2009 at 11:23 PM, Jeremie Pelletier <jeremiep gmail.com> wrote:

 There is no such thing as "not being able to happen" :)

 Object thisCannotPossiblyBeNullInAnyWayWhatsoever = cast(Object)null;

 I seem to be the only one who sees Walter's side of things in this thread
 :o)

Why the hell would the compiler allow that to begin with? Why bother implementing nonnull references only to allow the entire system to be broken?
Sep 26 2009
prev sibling next sibling parent language_fan <foo bar.com.invalid> writes:
Wed, 30 Sep 2009 12:05:29 -0400, Jeremie Pelletier thusly wrote:

 Don wrote:
 Greater risks come from using more complicated algorithms. Brute-force
 algorithms are always the easiest ones to get right <g>.

I'm not sure I agree with that. Those algorithms are pretty isolated and really easy to write unittests for so I don't see where the risk is when writing more complex algorithms, it's obviously harder, but not riskier.

Do you recommend writing larger algorithms like a hard real-time distributed (let's say e.g. for 100+ processes/nodes) garbage collector or even larger stuff like btrfs or ntfs file system drivers in assembly? Don't you care about portability? Of course it would be nice to provide optimal solution for each platform and for each use case, but unfortunately the TCO thinking managers do not often agree.
Sep 30 2009
prev sibling next sibling parent language_fan <foo bar.com.invalid> writes:
Wed, 30 Sep 2009 17:05:18 -0400, Jeremie Pelletier thusly wrote:

 language_fan wrote:
 Wed, 30 Sep 2009 12:05:29 -0400, Jeremie Pelletier thusly wrote:
 
 Don wrote:
 Greater risks come from using more complicated algorithms.
 Brute-force algorithms are always the easiest ones to get right <g>.

and really easy to write unittests for so I don't see where the risk is when writing more complex algorithms, it's obviously harder, but not riskier.

Do you recommend writing larger algorithms like a hard real-time distributed (let's say e.g. for 100+ processes/nodes) garbage collector or even larger stuff like btrfs or ntfs file system drivers in assembly? Don't you care about portability? Of course it would be nice to provide optimal solution for each platform and for each use case, but unfortunately the TCO thinking managers do not often agree.

Why does everyone associate complexity with assembly? You can write a more complex algorithm in the same language as the original one and get quite a good performance boost (ie binary search vs walking an array). Assembly is only useful to optimize when you found the optimal algorithm and want to lower its overhead a step further. I don't recommend any language anyways, the base algorithm is often independent of its implementation language, be it implemented in C#, D or assembly its gonna do the same thing at different performance levels. For example a simple binary search is already faster in D than in say JavaScript, but its even faster in assembly than in D, that doesn't make your entire program harder to code, nor does it change the logic.

Well I meant that we can assume the algorithm choice is already optimal. Porting the high level program to assembly tends to grow the line count quite a bit. For instance I have experience converting Java code to Scala, and C++ to Haskell. In both cases the LOC will decrease about 50-90%. If you convert things like foreach, ranges, complex expressions, lambdas, an scope() constructs to assembly, it will increase the line count at least one order of magnitude. Reading the lower level code is much harder. And you lose important safety nets like the type system.
Sep 30 2009
prev sibling next sibling parent "Dejan Lekic" <dejan.lekic gmail.com> writes:
Walter, is that article publicly available?
Oct 02 2009
prev sibling parent "Dejan Lekic" <dejan.lekic gmail.com> writes:
Thanks Don! \o/
Oct 02 2009
prev sibling next sibling parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
Walter Bright wrote:
 Denis Koroskin wrote:
  > On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright
  > <newshound1 digitalmars.com> wrote:
  >> D has borrowed ideas from many different languages. The trick is to
  >> take the good stuff and avoid their mistakes <g>.
  >
  > How about this one:
  > 
 http://sadekdrobi.com/2008/12/22/null-references-the-billion-dollar-mistake/ 
 
  >
  >
  > :)
 
 I think he's wrong.

Please, please, please, do some fun little project in Java or C# and drop the idea of initializing variables whenever you declare them. Just leave them like this: int i; and then later initialize them when you need them, for example different values depending on some conditions. Then you'll realize how powerful is having the compiler stop variables that are not initialized *in the context of a function, not necessarily in the same line of their declaration*. It's always a win: you get a compile time error, you don't have to wait to get an error at runtime. Until you do that, you won't understand what most people are answering to you. But I know what you'll answer. You'll say "what about pointers?", "what about ref parameters?", "what about out parameters?", and then someone will say to you "C# has them", etc, etc. No point disussing non-null variables without also having the compiler stop uninitialized variables.
Sep 26 2009
next sibling parent Ary Borenszweig <ary esperanto.org.ar> writes:
Ary Borenszweig wrote:
 Walter Bright wrote:
 Denis Koroskin wrote:
  > On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright
  > <newshound1 digitalmars.com> wrote:
  >> D has borrowed ideas from many different languages. The trick is to
  >> take the good stuff and avoid their mistakes <g>.
  >
  > How about this one:
  > 
 http://sadekdrobi.com/2008/12/22/null-references-the-billion-dollar-mistake/ 

  >
  >
  > :)

 I think he's wrong.

Please, please, please, do some fun little project in Java or C# and drop the idea of initializing variables whenever you declare them. Just leave them like this: int i; and then later initialize them when you need them, for example different values depending on some conditions. Then you'll realize how powerful is having the compiler stop

I meant "spot"
Sep 26 2009
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Ary Borenszweig:

 Please, please, please, do some fun little project in Java or C# and 
 drop the idea of initializing variables whenever you declare them. Just 
 leave them like this:

 Until you do that, you won't understand what most people are answering 
 to you.

Something similar happens in other fields too. I have had long discussions with nonbiologists about evolutionary matters. Later I have understood that those discussions weren't very useful, the best thing for them, to understand why and how evolution happens, is to do a week of field etology, studying how insects on a wild lawn interact, compete, fight and cooperate with each other. If you have some expert that shows you things in just a week you can see lot of things. At that point you have some common frame of reference that allows you to understand how evolution happens :-) Practical experience is important. Bye, bearophile
Sep 26 2009
prev sibling next sibling parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Ary Borenszweig wrote:
 Walter Bright wrote:
 Denis Koroskin wrote:
  > On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright
  > <newshound1 digitalmars.com> wrote:
  >> D has borrowed ideas from many different languages. The trick is to
  >> take the good stuff and avoid their mistakes <g>.
  >
  > How about this one:
  > 
 http://sadekdrobi.com/2008/12/22/null-references-the-billion-dollar-mistake/ 

  >
  >
  > :)

 I think he's wrong.

Please, please, please, do some fun little project in Java or C# and drop the idea of initializing variables whenever you declare them. Just leave them like this: int i; and then later initialize them when you need them, for example different values depending on some conditions. Then you'll realize how powerful is having the compiler stop variables that are not initialized *in the context of a function, not necessarily in the same line of their declaration*. It's always a win: you get a compile time error, you don't have to wait to get an error at runtime. Until you do that, you won't understand what most people are answering to you. But I know what you'll answer. You'll say "what about pointers?", "what about ref parameters?", "what about out parameters?", and then someone will say to you "C# has them", etc, etc. No point disussing non-null variables without also having the compiler stop uninitialized variables.

All null values are uninitialized, but not all initializers are null, especially the void initializer. You can't always rely on initializers in your algorithms, you can always rely on null. Kinda like all pointers are references, but not all references are pointers. You can't do pointer arithmetic on references.
Sep 26 2009
parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
Jeremie Pelletier wrote:
 Ary Borenszweig wrote:
 Walter Bright wrote:
 Denis Koroskin wrote:
  > On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright
  > <newshound1 digitalmars.com> wrote:
  >> D has borrowed ideas from many different languages. The trick is to
  >> take the good stuff and avoid their mistakes <g>.
  >
  > How about this one:
  > 
 http://sadekdrobi.com/2008/12/22/null-references-the-billion-dollar-mistake/ 

  >
  >
  > :)

 I think he's wrong.

Please, please, please, do some fun little project in Java or C# and drop the idea of initializing variables whenever you declare them. Just leave them like this: int i; and then later initialize them when you need them, for example different values depending on some conditions. Then you'll realize how powerful is having the compiler stop variables that are not initialized *in the context of a function, not necessarily in the same line of their declaration*. It's always a win: you get a compile time error, you don't have to wait to get an error at runtime. Until you do that, you won't understand what most people are answering to you. But I know what you'll answer. You'll say "what about pointers?", "what about ref parameters?", "what about out parameters?", and then someone will say to you "C# has them", etc, etc. No point disussing non-null variables without also having the compiler stop uninitialized variables.

All null values are uninitialized, but not all initializers are null, especially the void initializer.

I don't see your point here. "new Object()" is not a null intiializer nor "1"... so? You can't always rely on initializers
 in your algorithms, you can always rely on null.

Yes, I can always rely on initializers in my algorithm. I can, if the compiler lets me safely initialize them whenever I want, not necessarily in the line I declare them. Just out of curiosity: have you ever programmed in Java or C#?
Sep 26 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Ary Borenszweig wrote:
 Jeremie Pelletier wrote:
 Ary Borenszweig wrote:
 Walter Bright wrote:
 Denis Koroskin wrote:
  > On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright
  > <newshound1 digitalmars.com> wrote:
  >> D has borrowed ideas from many different languages. The trick is to
  >> take the good stuff and avoid their mistakes <g>.
  >
  > How about this one:
  > 
 http://sadekdrobi.com/2008/12/22/null-references-the-billion-dollar-mistake/ 

  >
  >
  > :)

 I think he's wrong.

Please, please, please, do some fun little project in Java or C# and drop the idea of initializing variables whenever you declare them. Just leave them like this: int i; and then later initialize them when you need them, for example different values depending on some conditions. Then you'll realize how powerful is having the compiler stop variables that are not initialized *in the context of a function, not necessarily in the same line of their declaration*. It's always a win: you get a compile time error, you don't have to wait to get an error at runtime. Until you do that, you won't understand what most people are answering to you. But I know what you'll answer. You'll say "what about pointers?", "what about ref parameters?", "what about out parameters?", and then someone will say to you "C# has them", etc, etc. No point disussing non-null variables without also having the compiler stop uninitialized variables.

All null values are uninitialized, but not all initializers are null, especially the void initializer.

I don't see your point here. "new Object()" is not a null intiializer nor "1"... so?

Object o = void;
  You can't always rely on initializers
 in your algorithms, you can always rely on null.

Yes, I can always rely on initializers in my algorithm. I can, if the compiler lets me safely initialize them whenever I want, not necessarily in the line I declare them. Just out of curiosity: have you ever programmed in Java or C#?

Nope, never got interested in these to tell the truth. I only did C, C++, D and x86 assembly in systems programming, I have quite a background in PHP and JavaScript also. I played with a lot of languages, but those are the ones I use on a daily basis. I would like to get into Python or Ruby someday, I only hear good things about these two. I know LUA has less overhead than Python, but it's more of a support language to implement easy scripting over C than a standalone language, I already have my LUA bindings for D ready to do just that. I like extremes :)
Sep 26 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
language_fan wrote:
 Sun, 27 Sep 2009 00:08:50 -0400, Jeremie Pelletier thusly wrote:
 
 Ary Borenszweig wrote:
 Just out of curiosity: have you ever programmed in Java or C#?

C++, D and x86 assembly in systems programming, I have quite a background in PHP and JavaScript also.

So you only know imperative procedural programming + some features of hybrid OOP languages that are not even proper OOP languages.

This is what I know best, yeah. I did a lot of work in functional programming too, but not enough to add them to the above list. What is proper OOP anyways? It's a feature offered by the language, not a critical design that must obey to some strict standard rules. Be it class based or prototype based, supporting single or multiple inheritance, using abstract base classes or interfaces, having funny syntax for ctors and whatnot or using the class name or even 'this', its still OOP. If you wan't to call me on not knowing 15 languages like you do, I have to call you on not knowing the differences in OOP models.
 I played with a lot of languages, but those are the ones I use on a
 daily basis. I would like to get into Python or Ruby someday, I only
 hear good things about these two. I know LUA has less overhead than
 Python

Oh, the only difference between LUA and Python is the overhead?! That's a... pretty performance oriented view on languages.

Yes, I have a performance oriented view, I write a lot of real time code, and I hate unresponsive code in general. Now I didn't say it was the only difference, what I said is that it's one influencing a lot companies and people to pick LUA over Python for scripting.
 I like extremes :)

If you like extremes, why have you not programming in Haskell or Coq? Too scary? You are often arguing against languages and concepts you have never used. The other people here who make these suggestions are more experienced with various languages.

I meant extremes as in full machine control / no control whatsoever, not in language semantics :) I just haven't found a use for Haskell or Coq for what I do yet.
Sep 27 2009
parent Jeremie Pelletier <jeremiep gmail.com> writes:
language_fan wrote:
 Sun, 27 Sep 2009 12:35:23 -0400, Jeremie Pelletier thusly wrote:
 
 language_fan wrote:
 Sun, 27 Sep 2009 00:08:50 -0400, Jeremie Pelletier thusly wrote:

 Ary Borenszweig wrote:
 Just out of curiosity: have you ever programmed in Java or C#?

C++, D and x86 assembly in systems programming, I have quite a background in PHP and JavaScript also.

hybrid OOP languages that are not even proper OOP languages.

programming too, but not enough to add them to the above list. What is proper OOP anyways? It's a feature offered by the language, not a critical design that must obey to some strict standard rules. Be it class based or prototype based, supporting single or multiple inheritance, using abstract base classes or interfaces, having funny syntax for ctors and whatnot or using the class name or even 'this', its still OOP. If you wan't to call me on not knowing 15 languages like you do, I have to call you on not knowing the differences in OOP models.

I must say I have not studied languages that much, only the concepts and theory - starting from formal definitions like operational or denotational semantics, and some more informal ones. I can professionally write code in only about half a dozen languages, but learning new ones is trivial if the task requires it. Generally the common thing for proper pure OOP languages is 'everything is an object' mentality. Because of this property there is no strict distinction between primitive non-OOP types and OOP types in pure OOP languages. In some languages e.g. number values are objects. In others there are no static members and even classes are objects, so called meta- objects. In some way you can see this purity even in UML. If we go into details, various OOP languages have major differences in their semantics. What I meant above is that I know a lot of developers who have a similar background as you do. It is really easy to use all of those languages without actually using the OOP features in them, at least properly (for instance PHP does not even have a real OOP system, it is a cheap rip-off of mainstream languages - just look at the scoping rules). I have seen Java code where the developer never constructs new objects and only uses static methods because he fears the heap allocation is expensive. Discussing OOP and language concepts is really hard if you lack the theoretical underpinning. It is sad to say this but the best source for this knowledge are academic CS books, but nowadays even wikipedia is starting to have good articles on the subject.

I agree, Wikipedia is often the first source I check to learn on different concepts, then I search for online papers and documentation, dig into source code (Google's code search is a gem), and finally books. I'm not most programmers, and I'm sure you aren't either. I like to learn as much of the semantics and implementation details behind a language as I can, only then do I feel I know the language, I like to make the best out of everything in the languages I use, not specialize in a subset of it. I don't believe in a perfect programming model, I believe in many different models each having their pros and cons that can live in the same language forming an all-around solution. That's why I usually stay away from 'pure' languages because they impose a single point of view of the world, that doesn't mean its a bad one, I just like to look at the world from different angles at the same time.
Sep 27 2009
prev sibling next sibling parent language_fan <foo bar.com.invalid> writes:
Sun, 27 Sep 2009 00:08:50 -0400, Jeremie Pelletier thusly wrote:

 Ary Borenszweig wrote:
 Just out of curiosity: have you ever programmed in Java or C#?

Nope, never got interested in these to tell the truth. I only did C, C++, D and x86 assembly in systems programming, I have quite a background in PHP and JavaScript also.

So you only know imperative procedural programming + some features of hybrid OOP languages that are not even proper OOP languages.
 
 I played with a lot of languages, but those are the ones I use on a
 daily basis. I would like to get into Python or Ruby someday, I only
 hear good things about these two. I know LUA has less overhead than
 Python

Oh, the only difference between LUA and Python is the overhead?! That's a... pretty performance oriented view on languages.
 I like extremes :)

If you like extremes, why have you not programming in Haskell or Coq? Too scary? You are often arguing against languages and concepts you have never used. The other people here who make these suggestions are more experienced with various languages.
Sep 27 2009
prev sibling parent language_fan <foo bar.com.invalid> writes:
Sun, 27 Sep 2009 12:35:23 -0400, Jeremie Pelletier thusly wrote:

 language_fan wrote:
 Sun, 27 Sep 2009 00:08:50 -0400, Jeremie Pelletier thusly wrote:
 
 Ary Borenszweig wrote:
 Just out of curiosity: have you ever programmed in Java or C#?

C++, D and x86 assembly in systems programming, I have quite a background in PHP and JavaScript also.

So you only know imperative procedural programming + some features of hybrid OOP languages that are not even proper OOP languages.

This is what I know best, yeah. I did a lot of work in functional programming too, but not enough to add them to the above list. What is proper OOP anyways? It's a feature offered by the language, not a critical design that must obey to some strict standard rules. Be it class based or prototype based, supporting single or multiple inheritance, using abstract base classes or interfaces, having funny syntax for ctors and whatnot or using the class name or even 'this', its still OOP. If you wan't to call me on not knowing 15 languages like you do, I have to call you on not knowing the differences in OOP models.

I must say I have not studied languages that much, only the concepts and theory - starting from formal definitions like operational or denotational semantics, and some more informal ones. I can professionally write code in only about half a dozen languages, but learning new ones is trivial if the task requires it. Generally the common thing for proper pure OOP languages is 'everything is an object' mentality. Because of this property there is no strict distinction between primitive non-OOP types and OOP types in pure OOP languages. In some languages e.g. number values are objects. In others there are no static members and even classes are objects, so called meta- objects. In some way you can see this purity even in UML. If we go into details, various OOP languages have major differences in their semantics. What I meant above is that I know a lot of developers who have a similar background as you do. It is really easy to use all of those languages without actually using the OOP features in them, at least properly (for instance PHP does not even have a real OOP system, it is a cheap rip-off of mainstream languages - just look at the scoping rules). I have seen Java code where the developer never constructs new objects and only uses static methods because he fears the heap allocation is expensive. Discussing OOP and language concepts is really hard if you lack the theoretical underpinning. It is sad to say this but the best source for this knowledge are academic CS books, but nowadays even wikipedia is starting to have good articles on the subject.
Sep 27 2009
prev sibling next sibling parent Chad J <chadjoan __spam.is.bad__gmail.com> writes:
Walter Bright wrote:
 ...

Admittedly I didn't read the whole thread. It is hueg liek xbox. I'll try and explain this non-nullable by default thing in my own way. Consider a programmer wanting to define a variable. I will draw a decision tree that they would use in a language that has non-nullable (and nullable) references: Programmer needs to declare reference variable... | | | Do they know how to yes <-------- initialize it? --------> no | | | | | | v | Type t = someExpression(); | v yes <--------- Brains? ---> no | | | | v v Type? t; Type t = dummy; (Explicitly declare) (Why would anyone) (it to be nullable) (do this?!?) So having both kinds of reference types works out like that. Working with nulls as in current D is as easy as using a nullable type. When you need to pass a nullable type to a non-nullable variable or as a non-nullable function argument, you just manually check for the null like you should anyways: Type? t; ... code ... // If you're lazy. assert(t); func(t); OR, better yet: Type? t; ... code ... if ( t ) func(t); else // Explicitly handle the null value, // attempting error recovery if appropriate. I actually don't know if the syntax would be that nice, but I can dream. But I still haven't addressed the second part of this: Which is default? nullable or non-nullable? Currently nullable is the default. Let's consult a table. +---------------------+--------------+--------------+ | | default is | default is | | | non-nullable | nullable | +---------------------+--------------+--------------+ | Programmer DOESN'T | Compiler | Segfault in | | initialize the var. | error. | distant file | | ((s)he forgets) | Fast fix. | * | +---------------------+--------------+--------------+ | Programmer DOES | Everything | Everything | | initialize the var. | is fine. | is fine. | +---------------------+--------------+--------------+ | Programmer uses | They don't. | They don't. | | dummy variable. |Nullable used.| Segfault in | | | segfault** | distant file*| +---------------------+--------------+--------------+ * They will have hours of good fun finding where the segfault-causing null came from. If the project is non-trivial, the null may have crossed hands over a number of function calls, ditched the police by hiding in a static variable or some class until the heat dies down, or whoops aliasing. Sometimes stack traces help, sometimes they don't. We don't even have stack traces without hacking our D installs :/ ** Same as *, but less likely since functions are more likely to reject possibly null values, and thus head off the null's escape routes at compile time. I can see a couple issues with non-nullable by default: - This: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=96834 - It complicates the language just a bit more. I'm willing to grudgingly honor this as a reason for not implementing the feature.
Sep 26 2009
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 26 Sep 2009 17:08:32 -0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Denis Koroskin wrote:
  > On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright
  > <newshound1 digitalmars.com> wrote:
  >> D has borrowed ideas from many different languages. The trick is to
  >> take the good stuff and avoid their mistakes <g>.
  >
  > How about this one:
  >  
 http://sadekdrobi.com/2008/12/22/null-references-the-billion-dollar-mistake/  
  >
  >
  > :)

 I think he's wrong.

Analogies aside, we have 2 distinct problems here, with several solutions for each. I jotted down what I think are the solutions being discussed and the Pros and Cons of each are. Problem 1. Developer of a function wants to ensure non-null values are passed into his function. Solution 1: Rely on the hardware feature to do the checking for you. Pros: Easy to do, simple to implement, optimal performance (hardware's going to do this anyways). Cons: Runtime error instead of compile-time, Error doesn't always occur close to the problem, not always easy to get a stack trace. Solution 2: Check for null once the values come into the function, throw an exception. Pros: Works with the exception system. Cons: Manual implementation required, performance hit for every function call, Runtime error instead of compile-time, Error doesn't always occur close to the problem. Solution 3: Build the non-null requirement into the function signature (note, the requirement is optional, it's still possible to use null references if you want). Pros: Easy to implement, Compile-time error, hard to "work around" by putting a dummy value, sometimes no performance hit, most times very little performance hit, allows solution 1 and 2 if you want, runtime errors occur AT THE POINT things went wrong not later. Cons: Non-zero performance hit (you have to check for null sometimes before assignment!) Solution 4: Perform a null check for every dereference (The Java/C# solution). Pros: Works with the exception system, easy to implement. Cons: Huge performance hit (except in OS where segfault can be hooked), Error doesn't always occur close to the problem. ----------------------- Problem 2. Developer forgets to initialize a declared reference type, but uses it. Solution 1: Assign a default value of null. Rely on hardware to tell you when you use it later that you screwed up. Pros: Easy to do, simple to implement, optimal performance (hardware's going to do this anyways). Cons: Runtime error instead of compile-time, Error doesn't always occur close to the problem, not always easy to get a stack trace. Solution 2: Require assignment, even if assignment to null. (The "simple" solution) Pros: Easy to implement, forces the developer to clarify his requirements -- reminding him that there may be a problem. Cons: May be unnecessary, forces the developer to make a decision, may result in a dummy value being assigned reducing to solution 1. Solution 3: Build into the type the requirement that it can't be null, therefore checking for non-null on assignment. A default value isn't allowed. A nullable type is still allowed, which reduces to solution 1. Pros: Easy to implement, solution 1 is still possible, compile-time error on misuse, error occurs at the point things went wrong, no performance hit (except when you convert a nullable type to a non-nullable type), allows solution 3 for first problem. Cons: Non-zero performance hit when assigning nullable to non nullable type. Solution 4: Compiler performs flow analysis, giving an error when an unassigned variable is used. (The C# solution) Pros: Compile-time error, with good flow analysis allows correct code even when assignment isn't done on declaration. Cons: Difficult to implement, sometimes can incorrectly require assignment if flow is too complex, can force developer to manually assign null or dummy value. *NOTE* for solution 3 I purposely did NOT include the con that it makes people assign a dummy value. I believe this argument to be invalid, since it's much easier to just declare the variable as a nullable equivalent type (as other people have pointed out). That problem is more a factor of solutions 2 and 4. ---------------------- Anything I missed? After looking at all the arguments, and brainstorming myself, I think I prefer the non-nullable defaults (I didn't have a position on this concept before this thread, and I had given it some thought). I completely agree with Ary and some others who say "use C# for a while, and see how much it helps." I wrote C# code for a while, and I got those errors frequently, usually it was something I forgot to initialize or return. It definitely does not cause the "assign dummy value" syndrome as Walter has suggested. Experience with languages that do a good job of letting the programmer know when he made an actual mistake makes a huge difference. I think the non-nullable default will result in even less of a temptation to assign a dummy value. -Steve
Sep 27 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:

    Build the non-null requirement into the function signature (note, the  
 requirement is optional, it's still possible to use null references if you  
 want).
 
    Pros: Easy to implement, Compile-time error, hard to "work around" by  
 putting a dummy value, sometimes no performance hit, most times very  
 little performance hit, allows solution 1 and 2 if you want, runtime  
 errors occur AT THE POINT things went wrong not later.
    Cons: Non-zero performance hit (you have to check for null sometimes  
 before assignment!)

To implement it well (and I think it has to be implemented well) it's not so easy to implement. You have to face the problem I've discussed about about multiple object initializations inside various ifs. Also see what downs and I have said regarding arrays of nonnullables. Among the cons you also have to consider that there's a little more complexity in the language (two different kinds of references, and such things must also be explained in the docs and understood by novice D programmers. It's not a common feature, so they have to learn it). Another thing to add to the cons is that every layer of compile-time constraints you add to a language they also add a little amount of rigidity that has a cost (because you have to add ? and you sometimes may need casts to break such rigidity). Dynamic languages show that constraints have a cost. Bye, bearophile
Sep 27 2009
parent Yigal Chripun <yigal100 gmail.com> writes:
On 27/09/2009 17:51, bearophile wrote:
 Steven Schveighoffer:

 Build the non-null requirement into the function signature (note,
 the requirement is optional, it's still possible to use null
 references if you want).

 Pros: Easy to implement, Compile-time error, hard to "work around"
 by putting a dummy value, sometimes no performance hit, most times
 very little performance hit, allows solution 1 and 2 if you want,
 runtime errors occur AT THE POINT things went wrong not later.
 Cons: Non-zero performance hit (you have to check for null
 sometimes before assignment!)

To implement it well (and I think it has to be implemented well) it's not so easy to implement. You have to face the problem I've discussed about about multiple object initializations inside various ifs. Also see what downs and I have said regarding arrays of nonnullables.

I don't accept this argument about nested if statements. D has a procedural "if" statement. Of course it doesn't mesh together with non-nullable references, you're trying to fit a square peg in a round hole. the solution is to write a more functional style code. if D ever implements true tuples that would be a perfect use case for them. (T1 t1, T2 t2) = init(); t1.foo; t2.bar;
 Among the cons you also have to consider that there's a little more
 complexity in the language (two different kinds of references, and
 such things must also be explained in the docs and understood by
 novice D programmers. It's not a common feature, so they have to
 learn it).

that's true. Not only this needs to be taught and pointed out to newbies it should also be encouraged as the D way so that it will be used by default.
 Another thing to add to the cons is that every layer of compile-time
 constraints you add to a language they also add a little amount of
 rigidity that has a cost (because you have to add ? and you sometimes
 may need casts to break such rigidity). Dynamic languages show that
 constraints have a cost.

 Bye, bearophile

Sep 27 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 27 Sep 2009 11:51:27 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Steven Schveighoffer:

    Build the non-null requirement into the function signature (note, the
 requirement is optional, it's still possible to use null references if  
 you
 want).

    Pros: Easy to implement, Compile-time error, hard to "work around" by
 putting a dummy value, sometimes no performance hit, most times very
 little performance hit, allows solution 1 and 2 if you want, runtime
 errors occur AT THE POINT things went wrong not later.
    Cons: Non-zero performance hit (you have to check for null sometimes
 before assignment!)

To implement it well (and I think it has to be implemented well) it's not so easy to implement. You have to face the problem I've discussed about about multiple object initializations inside various ifs.

I think you are referring to a combination of this solution and flow analysis? I didn't mention that solution, but it is possible. I agree it would be more complicated, but I did say that as a con for flow analysis.
 Also see what downs and I have said regarding arrays of nonnullables.

Yes, arrays of non-nullables will be more cumbersome, I should add that as a con. Thanks.
 Among the cons you also have to consider that there's a little more  
 complexity in the language (two different kinds of references, and such  
 things must also be explained in the docs and understood by novice D  
 programmers. It's not a common feature, so they have to learn it).

It's not a common feature, but in practice, one doesn't usually need nullable types for most cases, it's only certain cases where it's needed. For example, no extra docs are needed for: auto a = new A(); // works, non-nullable is fine And maybe even for: A a; // error, must assign non-null value because that's a common feature of compilers. It's similar in my view to shared. Shared adds a level of complexity that needs to be understood if you want to use shared variables, but most of the time, your variables are not shared, so no extra thought is required.
 Another thing to add to the cons is that every layer of compile-time  
 constraints you add to a language they also add a little amount of  
 rigidity that has a cost (because you have to add ? and you sometimes  
 may need casts to break such rigidity). Dynamic languages show that  
 constraints have a cost.

The cost needs to be weighed against the cost of the alternatives. I think all the solutions have a cost. Dynamic languages have a cost too. I've been developing in php lately, and I don't know how many times I had a bug that I slightly mis-typed a variable name, which still was valid code because the language thought I was just declaring a new variable :) And to get the IDE to recognize types, I sometimes have to put in a line like this: // uncomment for autocomplete // x = new ClassType(); printf("Error, please remove line %d\n", __LINE__); throw new Exception(); Which I comment out when I'm running, but I uncomment to have the IDE recognize that x is a ClassType (for autocomplete). I think if there was a solution that cost nothing, it would be the clear winner. -Steve
Sep 27 2009
prev sibling parent reply Don <nospam nospam.com> writes:
Walter Bright wrote:
 Denis Koroskin wrote:
  > On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright
  > <newshound1 digitalmars.com> wrote:
  >> D has borrowed ideas from many different languages. The trick is to
  >> take the good stuff and avoid their mistakes <g>.
  >
  > How about this one:
  > 
 http://sadekdrobi.com/2008/12/22/null-references-the-billion-dollar-mistake/ 
 
  >
  >
  > :)
 
 I think he's wrong.
 
 Getting rid of null references is like solving the problem of dead 
 canaries in the coal mines by replacing them with stuffed toys.

Let's go back a step. The problem being addressed is this: inadvertent null references are an EXTREMELY common bug in D. For example, it's a bug which *every* C++ refugee gets hit by. I have experienced it ridiculously often in D. *** The problem of null references is an order of magnitude worse in D than in C++, because classes in D use reference semantics. *** Eliminating that category of bug at compile time would have a huge benefit for code quality. "Non-nullable references by default" is just a proposed solution. Maybe if D had better flow analysis, the demand for non-nullable references wouldn't be so great. (Neither is a pure subset of the other, flow analysis works for all variables, non-nullable references catches more complex logic errors. But there is a very significant overlap). Interestingly, while working on CTFE, I noticed that the CTFE code has a lot in common with flow analysis. I can easily imagine the same code being reused.
Sep 29 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Don:

 Maybe if D had better flow analysis, the demand for 
 non-nullable references wouldn't be so great.

I know a good enough C# programmer that agrees with you, he says that thanks to the flow analysis C#compiler performs, the need for non-nullable references is not so strong.
 (Neither is a pure subset of the other, flow analysis works for all 
 variables, non-nullable references catches more complex logic errors. 
 But there is a very significant overlap).

I like how you can see things a little more clearly than other people (like me). Flow analysis helps for all variables, but it's limited in the scope. Nonnullable references are a program-wide contract, their effect extends to called functions, etc. And helps avoid null tests inside them too. Probably flow analysis is the most important among such two features. I think having both is better, they can work in synergy. Bye, bearophile
Sep 29 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
bearophile wrote:
 Don:
 
 Maybe if D had better flow analysis, the demand for 
 non-nullable references wouldn't be so great.

I know a good enough C# programmer that agrees with you, he says that thanks to the flow analysis C#compiler performs, the need for non-nullable references is not so strong.

Which is what I said half a dozen times in this thread :) Disclaimer: I have only read about C#, didn't code it.
 (Neither is a pure subset of the other, flow analysis works for all 
 variables, non-nullable references catches more complex logic errors. 
 But there is a very significant overlap).

I like how you can see things a little more clearly than other people (like me). Flow analysis helps for all variables, but it's limited in the scope. Nonnullable references are a program-wide contract, their effect extends to called functions, etc. And helps avoid null tests inside them too. Probably flow analysis is the most important among such two features. I think having both is better, they can work in synergy. Bye, bearophile

Flow analysis must be implemented by the compiler, nonnull references can be enforced by a runtime wrapper (much like smart_ptr enforces addref and release calls in C++, you don't see smart_ptr being moved in the language spec even if half the C++ community would drool over the idea). The best thing about flow analysis is that we can take away the whole default initializer idea, since it was made to make non-initialized variable errors easy to pinpoint in the first place, not as a convenience to turn "int a = 0;" into "int a;". Besides DMD must have some basic flow analysis already since it does notice when a code path does not return, it just need to be extended to include unitialized variables.
Sep 29 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Jeremie Pelletier:

 Flow analysis must be implemented by the compiler, nonnull references 
 can be enforced by a runtime wrapper

The point of nonnull references is all in its compile-time enforced constraints.
 Besides DMD must have some basic flow analysis already since it does 
 notice when a code path does not return, it just need to be extended to 
 include unitialized variables.

You have probably missed them, but flow analysis in D was discussed a lot in the past. I don't think Walter wants to implement it. If you help implement it, showing that it can be done, he may change his mind. Bye, bearophile
Sep 29 2009
parent Jeremie Pelletier <jeremiep gmail.com> writes:
bearophile wrote:
 Jeremie Pelletier:
 
 Flow analysis must be implemented by the compiler, nonnull references 
 can be enforced by a runtime wrapper

The point of nonnull references is all in its compile-time enforced constraints.
 Besides DMD must have some basic flow analysis already since it does 
 notice when a code path does not return, it just need to be extended to 
 include unitialized variables.

You have probably missed them, but flow analysis in D was discussed a lot in the past. I don't think Walter wants to implement it. If you help implement it, showing that it can be done, he may change his mind. Bye, bearophile

I'll try and hack at it in a few weeks when I get some free time. Its definitely standing high on my D wishlist. Jeremie
Sep 29 2009