www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Everyone who writes safety critical software should read this

reply Walter Bright <newshound2 digitalmars.com> writes:
https://news.ycombinator.com/item?id=6636811

I know that everyone is tired of hearing my airframe design stories, but it's 
obvious to me that few engineers understand the principles of failsafe design. 
This article makes that abundantly clear - and the consequences of paying no 
attention to it.

You can add in Fukishima and Deepwater Horizon as more costly examples of 
ignorance of basic failsafe design principles.

Yeah, I feel strongly about this.
Oct 29 2013
next sibling parent reply "qznc" <qznc web.de> writes:
On Tuesday, 29 October 2013 at 20:38:08 UTC, Walter Bright wrote:
 https://news.ycombinator.com/item?id=6636811

 I know that everyone is tired of hearing my airframe design 
 stories, but it's obvious to me that few engineers understand 
 the principles of failsafe design. This article makes that 
 abundantly clear - and the consequences of paying no attention 
 to it.

 You can add in Fukishima and Deepwater Horizon as more costly 
 examples of ignorance of basic failsafe design principles.

 Yeah, I feel strongly about this.
Maybe you should write an article about "Failsafe Design Principles"? Some quick googleing did not turn up anything useful. Only horror stories and anti-examples. The only thing I found is a Star Wars reference [0], which gives the principle "Base access decisions on permission rather than exclusion". [0] http://emergentchaos.com/archives/2005/11/friday-star-wars-principle-of-fail-safe-defaults.html
Oct 29 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/29/2013 2:22 PM, qznc wrote:
 On Tuesday, 29 October 2013 at 20:38:08 UTC, Walter Bright wrote:
 https://news.ycombinator.com/item?id=6636811

 I know that everyone is tired of hearing my airframe design stories, but it's
 obvious to me that few engineers understand the principles of failsafe design.
 This article makes that abundantly clear - and the consequences of paying no
 attention to it.

 You can add in Fukishima and Deepwater Horizon as more costly examples of
 ignorance of basic failsafe design principles.

 Yeah, I feel strongly about this.
Maybe you should write an article about "Failsafe Design Principles"? Some quick googleing did not turn up anything useful. Only horror stories and anti-examples.
I wrote one for DDJ a few years back, "Safe Systems from Unreliable Parts". It's probably scrolled off their system.
Oct 29 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/29/2013 2:38 PM, Walter Bright wrote:
 I wrote one for DDJ a few years back, "Safe Systems from Unreliable Parts".
It's
 probably scrolled off their system.
http://www.drdobbs.com/architecture-and-design/safe-systems-from-unreliable-parts/228701716
Oct 29 2013
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Oct 29, 2013 at 02:39:59PM -0700, Walter Bright wrote:
 On 10/29/2013 2:38 PM, Walter Bright wrote:
I wrote one for DDJ a few years back, "Safe Systems from Unreliable Parts". It's
probably scrolled off their system.
http://www.drdobbs.com/architecture-and-design/safe-systems-from-unreliable-parts/228701716
This article refers to a "next instalment", but I couldn't find it. Do you have a link handy? T -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
Oct 29 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/29/2013 3:16 PM, H. S. Teoh wrote:
 On Tue, Oct 29, 2013 at 02:39:59PM -0700, Walter Bright wrote:
 On 10/29/2013 2:38 PM, Walter Bright wrote:
 I wrote one for DDJ a few years back, "Safe Systems from Unreliable Parts".
It's
 probably scrolled off their system.
http://www.drdobbs.com/architecture-and-design/safe-systems-from-unreliable-parts/228701716
This article refers to a "next instalment", but I couldn't find it. Do you have a link handy?
http://www.drdobbs.com/architecture-and-design/designing-safe-software-systems-part-2/228701618
Oct 29 2013
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Oct 29, 2013 at 05:08:57PM -0700, Walter Bright wrote:
 On 10/29/2013 3:16 PM, H. S. Teoh wrote:
On Tue, Oct 29, 2013 at 02:39:59PM -0700, Walter Bright wrote:
On 10/29/2013 2:38 PM, Walter Bright wrote:
I wrote one for DDJ a few years back, "Safe Systems from Unreliable
Parts". It's probably scrolled off their system.
http://www.drdobbs.com/architecture-and-design/safe-systems-from-unreliable-parts/228701716
This article refers to a "next instalment", but I couldn't find it. Do you have a link handy?
http://www.drdobbs.com/architecture-and-design/designing-safe-software-systems-part-2/228701618
Thanks! Is there a third instalment, or is this it? T -- That's not a bug; that's a feature!
Oct 29 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/29/2013 5:54 PM, H. S. Teoh wrote:
 Is there a third instalment, or is this it?
That's it.
Oct 29 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/29/2013 6:55 PM, Walter Bright wrote:
 On 10/29/2013 5:54 PM, H. S. Teoh wrote:
 Is there a third instalment, or is this it?
That's it.
The ideas are actually pretty simple. The hard parts are: 1. Convincing engineers that this is the right way to do it. 2. Convincing people that improving quality, better testing, hiring better engineers, government licensing for engineers, following MISRA standards, etc., are not the solution. (Note that all of the above were proposed in the HN thread.) 3. Beating out of engineers the hubris that "this part I designed will never fail!" Jeepers, how often I've heard that. 4. Developing a mindset of "what happens when this part fails in the worst way." 5. Learning to recognize inadvertent coupling between the primary and backup systems. 6. Being familiar with the case histories of failure of related designs. 7. Developing a system to track failures, the resolutions, and check that new designs don't suffer from the same problems. (Much like D's bugzilla, the test suite, and the auto-tester.)
Oct 29 2013
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Oct 29, 2013 at 07:14:50PM -0700, Walter Bright wrote:
[...]
 The ideas are actually pretty simple. The hard parts are:
 
 1. Convincing engineers that this is the right way to do it.
Yeah, if you had said this to me many years ago, I'd have rejected it. Sadly, it's only with hard experience that one comes to acknowledge wisdom.
 2. Convincing people that improving quality, better testing, hiring
 better engineers, government licensing for engineers, following
 MISRA standards, etc., are not the solution. (Note that all of the
 above were proposed in the HN thread.)
Ha. And yet where do we see companies pouring all that money into? Precisely into improving quality, improving test coverage, inventing better screening for hiring engineers, and in many places, requiring pieces of paper to certify that candidate X has successfully completed program Y sponsored by large corporation Z, which purportedly has a good reputation that therefore (by some inscrutible leap of logic) translates to proof that candidate X is capable of producing better code, which therefore equates to the product being made ... safer? Hmm. Something about the above line of reasoning seems to be fishy somewhere. :P (And don't even get me started on the corporate obsession with standards bearing acronymic buzzword names that purportedly will solve everything from software bugs to world hunger. As though the act of writing the acronym into the company recommended practices handbook [which we all know everybody loves to read and obey, to the letter] will actually change anything.)
 3. Beating out of engineers the hubris that "this part I designed
 will never fail!" Jeepers, how often I've heard that.
"This piece of code is so trivial, and so obviously, blatantly correct, that it serves as its own proof of correctness." (Later...) "What do you *mean* the unit tests are failing?!"
 4. Developing a mindset of "what happens when this part fails in the
 worst way."
I wish software companies would adopt this mentality. It would save so many headaches I get just from *using* software as an end-user (don't even mention what I have to do at work as a software developer).
 5. Learning to recognize inadvertent coupling between the primary
 and backup systems.
If there even *is* a backup system... :P I think a frighteningly high percentage of enterprise software fails this criterion.
 6. Being familiar with the case histories of failure of related
 designs.
They really should put this into the CS curriculum.
 7. Developing a system to track failures, the resolutions, and check
 that new designs don't suffer from the same problems. (Much like D's
 bugzilla, the test suite, and the auto-tester.)
I like how the test suite actually (mostly?) consists of failing cases from actual reported bugs, which the autotester then tests for, thus ensuring that the same bugs don't happen again. Most software companies have bug trackers, I'm pretty sure, but it's pretty scary how few of them actually have an *automated* system in place to ensure that previously-fixed bugs don't recur. Some places rely on the QA department doing manual testing over some standard checklist that may have no resemblance whatsoever to previously-fixed bugs, as though it's "good enough" that the next patch release (which is inevitably not just a "patch" but a full-on new version packed with new, poorly-tested features) doesn't outright crash on the most basic functionality. Use anything more complex than trivial, everyday tasks? With any luck, you'll crash within the first 5 minutes of using the new version just by previously-fixed bugs that got broken again. Which then leads to today's mentality of "let's *not* upgrade until everybody else has crashed the system to bits and the developers have been shamed into fixing them, then maybe things won't break as badly when we do upgrade". For automated testing to be practical, of course, requires that the system be designed to be tested in that way in the first place -- which unfortunately very few programmers have been trained to do. "Whaddya mean, make my code modular and independently testable? I've a deadline by 12am tonight, and I don't have time for that! Just hardcode the data into the global variables and get the product out the door before the midnight bell strikes; who cares if this thing is testable, as long as the customer thinks it looks like it works!" Sigh. T -- Береги платье снову, а здоровье смолоду.
Oct 30 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/30/2013 12:24 PM, H. S. Teoh wrote:
 On Tue, Oct 29, 2013 at 07:14:50PM -0700, Walter Bright wrote:
 Ha. And yet where do we see companies pouring all that money into?
 Precisely into improving quality, improving test coverage, inventing
 better screening for hiring engineers, and in many places, requiring
 pieces of paper to certify that candidate X has successfully completed
 program Y sponsored by large corporation Z, which purportedly has a good
 reputation that therefore (by some inscrutible leap of logic) translates
 to proof that candidate X is capable of producing better code, which
 therefore equates to the product being made ... safer? Hmm. Something
 about the above line of reasoning seems to be fishy somewhere. :P
There's still plenty of reason to improve software quality. I just want to emphasize that failsafe system design is not about improving quality.
Oct 30 2013
parent reply "eles" <eles eles.com> writes:
On Wednesday, 30 October 2013 at 20:06:19 UTC, Walter Bright 
wrote:
 On 10/30/2013 12:24 PM, H. S. Teoh wrote:
 On Tue, Oct 29, 2013 at 07:14:50PM -0700, Walter Bright wrote:
There's still plenty of reason to improve software quality. I just want to emphasize that failsafe system design is not about improving quality.
I did follow this thread for a while and it happens that I am working on this kind of software. I won't really know to say why it works, but several elements help with that: -some quite strict code style guidelines (it also helps a lot when working on some legacy code) -a small team of "safety" whose sole job is to question the code produced by us (I am among those developing) on the basis: "here, here, what happens if this fails?" -some analysis and traceability (you know: the "process" thing) tools that help both with code (MISRA-C, clang etc.) and documentation -good bug tracking and thorough discussion of the problem at hand before and after implementation -the developers themselves questioning their own every LOC they write No code is accepted unless it has a way to fail graciously. For this reason, unrolling changes in case of errors is a great proportion in code, so having that scope() statement would be pure gold for me. Basically, I think that critical code is almost always developed as if being transaction-based. It succeeds or it leaves no trace. OTOH, things that I would really like to work better are: -greater flexibility of the management when a developer tells: "I think this code part could be improved and some refactoring will help" -an incremental process, that is the management should assume that the first shipped version is not perfect instead of assuming that it is perfect and not being prepared for change requests
Oct 31 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/31/2013 9:00 AM, eles wrote:
 Basically, I think that critical code is almost always developed as if being
 transaction-based. It succeeds or it leaves no trace.
That's great for the software. What if the hardware fails? Such as a bad memory bit that flips a bit in the perfect software, and now it decides to launch nuclear missiles?
Oct 31 2013
next sibling parent Martin Drasar <drasar ics.muni.cz> writes:
On 31.10.2013 19:46, Walter Bright wrote:
 On 10/31/2013 9:00 AM, eles wrote:
 Basically, I think that critical code is almost always developed as if
 being
 transaction-based. It succeeds or it leaves no trace.
That's great for the software. What if the hardware fails? Such as a bad memory bit that flips a bit in the perfect software, and now it decides to launch nuclear missiles?
Three different pieces of software (written by different teams) that should do the same thing and then have a consensual voting on the correct action? Or even more pieces, depending on the clusterfuck that can be caused by flipped bit... The interaction with hardware can be a bit tricky and afterall anything can go wrong in the right circumstances, no matter how hard you try. It is up to you to decide cost/benefit.
Oct 31 2013
prev sibling parent reply "eles" <eles eles.com> writes:
On Thursday, 31 October 2013 at 18:46:07 UTC, Walter Bright wrote:
 On 10/31/2013 9:00 AM, eles wrote:
 What if the hardware fails? Such as a bad memory bit that flips 
 a bit in the perfect software, and now it decides to launch 
 nuclear missiles?
If that happens, any software verification could become useless. On the latest project that I'm working on, we simply went with two identical (but not independently-developed, just identical) hardwares, embedded software on them. A comparator compares the two outputs. Any difference results in an emergency procedure (either a hardware reboot through a watchdog, either a controlled shutdown - to avoid infinite loop reboot).
Oct 31 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/31/2013 2:24 PM, eles wrote:
 On Thursday, 31 October 2013 at 18:46:07 UTC, Walter Bright wrote:
 On 10/31/2013 9:00 AM, eles wrote:
 What if the hardware fails? Such as a bad memory bit that flips a bit in the
 perfect software, and now it decides to launch nuclear missiles?
If that happens, any software verification could become useless. On the latest project that I'm working on, we simply went with two identical (but not independently-developed, just identical) hardwares, embedded software on them. A comparator compares the two outputs. Any difference results in an emergency procedure (either a hardware reboot through a watchdog, either a controlled shutdown - to avoid infinite loop reboot).
What I posted on HN: ------------------ All I know in detail is the 757 system, which uses triply-redundant hydraulic systems. Any computer control of the flight control systems (such as the autopilot) can be quickly locked out by the pilot who then reverts to manual control. The computer control systems were dual, meaning two independent computer boards. The boards were designed independently, had different CPU architectures on board, were programmed in different languages, were developed by different teams, the algorithms used were different, and a third group would check that there was no inadvertent similarity. An electronic comparator compared the results of the boards, and if they differed, automatically locked out both and alerted the pilot. And oh yea, there were dual comparators, and either one could lock them out. This was pretty much standard practice at the time. Note the complete lack of "we can write software that won't fail!" nonsense. This attitude permeates everything in airframe design, which is why air travel is so incredibly safe despite its inherent danger. https://news.ycombinator.com/item?id=6639097
Oct 31 2013
prev sibling parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Thu, 31 Oct 2013 17:00:59 +0100
schrieb "eles" <eles eles.com>:

 -an incremental process, that is the management should assume 
 that the first shipped version is not perfect instead of assuming 
 that it is perfect and not being prepared for change requests
I would discriminate between change requests and bug reports. You should be responsible for any bugs and fix them, but changes resulting from unclear specifications are entirely different. I wouldn't do any more real work on the project than is written down in the contract. (Meaning: Be prepared for the changes you allowed for, not random feature requests.) -- Marco
Oct 31 2013
parent reply "eles" <eles eles.com> writes:
On Thursday, 31 October 2013 at 20:32:49 UTC, Marco Leise wrote:
 Am Thu, 31 Oct 2013 17:00:59 +0100
 schrieb "eles" <eles eles.com>:
 I would discriminate between change requests and bug reports.
 You should be responsible for any bugs and fix them, but
 changes resulting from unclear specifications are entirely
 different. I wouldn't do any more real work on the project
 than is written down in the contract. (Meaning: Be prepared
 for the changes you allowed for, not random feature requests.)
Yeah, maybe is a corporation culture to avoid the term "bug", but we always use the term "change request". Maybe it has a better image :) Normally, it is assumed that passing the tests proves that specifications are accomplished, so the software is perfect. This, of course, if the tests themselves would be correct 100% and *really* extensive. Or, some things like race conditions and other heisenbugs occur only rarely. So, you still need to conceptualize and so on. In practice, is not really different to fix a bug or to make an evolution of the code, except that for the former the urgency is greater. Anyway, in the end, it is the guy with the budget that decides. It is an iterative process, however: you start with some ideas, you implement some code, you go back to the architecture description and change it a bit, in the meantime you receive a request to add some new specification or functionality so back to square one and so on. But this is development. What really ensures the quality is that, at the end, before shipping, all the steps are once again checked, this time in the normal, forward mode: requirements, architecture, code review, tests. *Only then* it is compiled and finally passed to... well, not to the production, but to the dedicated Validation team.
Oct 31 2013
parent reply "Wyatt" <wyatt.epp gmail.com> writes:
On Thursday, 31 October 2013 at 21:36:11 UTC, eles wrote:
 Yeah, maybe is a corporation culture to avoid the term "bug", 
 but we always use the term "change request". Maybe it has a 
 better image :)
Lately, I've instead been reframing my thinking toward parity with Dijkstra. EWD1036 [0] is particularly relevant to this topic: "We could, for instance, begin with cleaning up our language by no longer calling a bug a bug but by calling it an error. It is much more honest because it squarely puts the blame where it belongs, viz. with the programmer who made the error. The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation. The nice thing of this simple change of vocabulary is that it has such a profound effect: while, before, a program with only one bug used to be 'almost correct', afterwards a program with an error is just 'wrong' (because in error)." As a bonus, my experience is it more readily encourages management types to accept that fixing them is important.
 Normally, it is assumed that passing the tests proves that 
 specifications are accomplished, so the software is perfect.

 This, of course, if the tests themselves would be correct 100% 
 and *really* extensive.
Again from EWD1036: "Besides the notion of productivity, also that of quality control continues to be distorted by the reassuring illusion that what works with other devices works with programs as well. It is now two decades since it was pointed out that program testing may convincingly demonstrate the presence of bugs, but can never demonstrate their absence. After quoting this well-publicized remark devoutly, the software engineer returns to the order of the day and continues to refine his testing strategies, just like the alchemist of yore, who continued to refine his chrysocosmic purifications." This passage comes just after he laments that "software engineer" had been diluted so thoroughly as to be meaningless. (I'd greatly appreciate if this term could be reclaimed, honestly. Experience has shown me quite clearly that not every programmer is an engineer.) -Wyatt [0] http://www.cs.utexas.edu/users/EWD/transcriptions/EWD10xx/EWD1036.html
Nov 01 2013
next sibling parent "Chris" <wendlec tcd.ie> writes:
On Friday, 1 November 2013 at 13:52:01 UTC, Wyatt wrote:
 On Thursday, 31 October 2013 at 21:36:11 UTC, eles wrote:
 Yeah, maybe is a corporation culture to avoid the term "bug", 
 but we always use the term "change request". Maybe it has a 
 better image :)
Lately, I've instead been reframing my thinking toward parity with Dijkstra. EWD1036 [0] is particularly relevant to this topic: "We could, for instance, begin with cleaning up our language by no longer calling a bug a bug but by calling it an error. It is much more honest because it squarely puts the blame where it belongs, viz. with the programmer who made the error. The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation. The nice thing of this simple change of vocabulary is that it has such a profound effect: while, before, a program with only one bug used to be 'almost correct', afterwards a program with an error is just 'wrong' (because in error)." As a bonus, my experience is it more readily encourages management types to accept that fixing them is important.
 Normally, it is assumed that passing the tests proves that 
 specifications are accomplished, so the software is perfect.

 This, of course, if the tests themselves would be correct 100% 
 and *really* extensive.
Again from EWD1036: "Besides the notion of productivity, also that of quality control continues to be distorted by the reassuring illusion that what works with other devices works with programs as well. It is now two decades since it was pointed out that program testing may convincingly demonstrate the presence of bugs, but can never demonstrate their absence. After quoting this well-publicized remark devoutly, the software engineer returns to the order of the day and continues to refine his testing strategies, just like the alchemist of yore, who continued to refine his chrysocosmic purifications." This passage comes just after he laments that "software engineer" had been diluted so thoroughly as to be meaningless. (I'd greatly appreciate if this term could be reclaimed, honestly. Experience has shown me quite clearly that not every programmer is an engineer.) -Wyatt [0] http://www.cs.utexas.edu/users/EWD/transcriptions/EWD10xx/EWD1036.html
No, not every programmer is an engineer. But not every programmer writes safety critical code. If Firefox crashes, nobody dies as a consequence (hopefully!).
Nov 01 2013
prev sibling next sibling parent "eles" <eles eles.com> writes:
On Friday, 1 November 2013 at 13:52:01 UTC, Wyatt wrote:
 On Thursday, 31 October 2013 at 21:36:11 UTC, eles wrote:
 much more honest because it squarely puts the blame where it 
 belongs, viz. with the programmer who made the error. The
That's in an ideal world. When different people work on the same code base, it is not so easy to tell who made the error. Look at a race condition when neither of two or three developers takes the mutex. Who made the error then? All that you have is a buggy program (btw, error implies something about being systematic, while bugs are not necessarily) or a program with errors. But, telling *who* made the error is not that simple. And, in most of the cases, would be also quite useless. We do not hunt people, but bugs :p (sorry, it sounds better than hunting errors :)
 testing may convincingly demonstrate the presence of bugs, but 
 can never demonstrate their absence.
Everybody knows that. Alas, testing is not the silver bullet, but at least is a bullet. Just imagine how software shipped without any testing will behave: "it compiles! let's ship it!" Corporations are not chasing the phyilosophical perfections, they are pragmatic. The thing that somewhat works and they have on the table is testing. In a perfect world, you'd have perfect programmers, perfect programs. The thing is, you are not living in a perfect world. Tests are not perfect neither but are among the best that you can get.
Nov 01 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/1/2013 6:52 AM, Wyatt wrote:
 "We could, for instance, begin with cleaning up our language by no longer
 calling a bug a bug but by calling it an error. It is much more honest because
 it squarely puts the blame where it belongs, viz. with the programmer who made
 the error.
Although it is tempting to do so, creating a culture of "blame the programmer" for the mistakes he's made also creates a culture of denial of problems. If you want to create quality of software, a far better culture is one that recognizes that people are imperfect, and looks for collaborative ways to engineer the possibility of errors out of the system. That doesn't work if you're trying to pin the blame on somebody.
Nov 01 2013
parent "Wyatt" <wyatt.epp gmail.com> writes:
On Saturday, 2 November 2013 at 04:06:46 UTC, Walter Bright wrote:
 On 11/1/2013 6:52 AM, Wyatt wrote:
 "We could, for instance, begin with cleaning up our language 
 by no longer
 calling a bug a bug but by calling it an error. It is much 
 more honest because
 it squarely puts the blame where it belongs, viz. with the 
 programmer who made
 the error.
Although it is tempting to do so, creating a culture of "blame the programmer" for the mistakes he's made also creates a culture of denial of problems. If you want to create quality of software, a far better culture is one that recognizes that people are imperfect, and looks for collaborative ways to engineer the possibility of errors out of the system. That doesn't work if you're trying to pin the blame on somebody.
My reading was less that an error should be hauled out as an indictment of the individual who made it and more that we should collectively be more cognizant of our own fallible nature and accept that that affects the work we do. In that vein the _who_ is less important than the explicit understanding that some human (probably me) mucked up. Of course, I tend to read EWD with a fair bit of fabric softener- he was a grumpy old man on a mission. Though even with the more literal interpretation, I'm not sure I agree that necessarily has to be negative. If I'm in error, I honestly _want_ to know. How it's conveyed is a function of the culture that can make it a positive (learning) experience or a negative one. -Wyatt
Nov 04 2013
prev sibling next sibling parent reply "growler" <growlercab gmail.com> writes:
On Wednesday, 30 October 2013 at 19:25:45 UTC, H. S. Teoh wrote:
 On Tue, Oct 29, 2013 at 07:14:50PM -0700, Walter Bright wrote:
 [...]
 For automated testing to be practical, of course, requires that 
 the
 system be designed to be tested in that way in the first place 
 -- which
 unfortunately very few programmers have been trained to do. 
 "Whaddya
 mean, make my code modular and independently testable? I've a 
 deadline
 by 12am tonight, and I don't have time for that! Just hardcode 
 the data
 into the global variables and get the product out the door 
 before the
 midnight bell strikes; who cares if this thing is testable, as 
 long as
 the customer thinks it looks like it works!"

 Sigh.


 T
Agree 100%. I read a book way back in the late 1990's, "Rapid Development" by Steve McConnell I think it was called. I remember it was a great read and filled with case studies where development best practices are dissolved by poor management. This Toyota story reads very much like the examples in that book.
Oct 30 2013
parent "Chris" <wendlec tcd.ie> writes:
On Wednesday, 30 October 2013 at 22:31:45 UTC, growler wrote:
 On Wednesday, 30 October 2013 at 19:25:45 UTC, H. S. Teoh wrote:
 On Tue, Oct 29, 2013 at 07:14:50PM -0700, Walter Bright wrote:
 [...]
 For automated testing to be practical, of course, requires 
 that the
 system be designed to be tested in that way in the first place 
 -- which
 unfortunately very few programmers have been trained to do. 
 "Whaddya
 mean, make my code modular and independently testable? I've a 
 deadline
 by 12am tonight, and I don't have time for that! Just hardcode 
 the data
 into the global variables and get the product out the door 
 before the
 midnight bell strikes; who cares if this thing is testable, as 
 long as
 the customer thinks it looks like it works!"

 Sigh.


 T
Agree 100%. I read a book way back in the late 1990's, "Rapid Development" by Steve McConnell I think it was called. I remember it was a great read and filled with case studies where development best practices are dissolved by poor management. This Toyota story reads very much like the examples in that book.
Mind you that corporate ideology might be just as harmful as bad engineering. I'm sure there is the odd engineer who points out a thing or two to the management, but they won't have none of that. German troops in Russia were not provided with winter gear, because the ideology of the leadership dictated (this is the right word) that Moscow be taken before winter. I wouldn't rule it out that "switch-off-engine-buttons" are a taboo in certain companies for purely ideological reasons.
Oct 30 2013
prev sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 30 October 2013 at 19:25:45 UTC, H. S. Teoh wrote:
 "This piece of code is so trivial, and so obviously, blatantly 
 correct,
 that it serves as its own proof of correctness." (Later...) 
 "What do you
 *mean* the unit tests are failing?!"
I have quite a lot of horror stories about this kind of code :D Now I do not try to argue with people coming with this, simply write a test. Usually you don't need to get very far : absurdly high volume, malformed input, contrived memory, run the thing is a thread and kill the thread in the middle, etc . . . Hopefully, it is much less common for me now to have to do so. A programming school in France, which is well known for having uncommon practices (but form great people at the end) do run every program submitted by the student in an environment with 8ko of RAM. The program is not expected to do its job, but to at least fail properly.
 Most software companies have bug trackers,
I used to work in a company with a culture strongly opposed to the use of such tool for some reason I still do not understand. At some point I simply answered to people that bugs didn't existed when they weren't in the bug tracker.
 For automated testing to be practical, of course, requires that 
 the
 system be designed to be tested in that way in the first place 
 -- which
 unfortunately very few programmers have been trained to do. 
 "Whaddya
 mean, make my code modular and independently testable? I've a 
 deadline
 by 12am tonight, and I don't have time for that! Just hardcode 
 the data
 into the global variables and get the product out the door 
 before the
 midnight bell strikes; who cares if this thing is testable, as 
 long as
 the customer thinks it looks like it works!"
My experience tells me that this pay off in matter of days. Days as in less than a week. Doing the hacky stuff feel like it is faster, but measurement says otherwise.
Oct 30 2013
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Oct 31, 2013 at 02:17:59AM +0100, deadalnix wrote:
 On Wednesday, 30 October 2013 at 19:25:45 UTC, H. S. Teoh wrote:
"This piece of code is so trivial, and so obviously, blatantly
correct, that it serves as its own proof of correctness." (Later...)
"What do you *mean* the unit tests are failing?!"
I have quite a lot of horror stories about this kind of code :D Now I do not try to argue with people coming with this, simply write a test. Usually you don't need to get very far : absurdly high volume, malformed input, contrived memory, run the thing is a thread and kill the thread in the middle, etc . . .
A frighteningly high percentage of regular code already fails for trivial boundary conditions (like pass in an empty list, or NULL, or empty string, etc.), not even getting to unusual input or stress tests.
 Hopefully, it is much less common for me now to have to do so.
 
 A programming school in France, which is well known for having
 uncommon practices (but form great people at the end) do run every
 program submitted by the student in an environment with 8ko of RAM.
 The program is not expected to do its job, but to at least fail
 properly.
Ha. I should go to that school and write programs that don't need more than 8KB of RAM to work. :) I used to pride myself on programs that require the absolute minimum of resources to work. (Unfortunately, I can't speak well of the quality of the code though! :P)
Most software companies have bug trackers,
I used to work in a company with a culture strongly opposed to the use of such tool for some reason I still do not understand. At some point I simply answered to people that bugs didn't existed when they weren't in the bug tracker.
Wow. No bug tracker?? That's just insane. How do they keep track of anything?? At my current job, we actually use the bug tracker not just for actual bugs but for tracking project discussions (via bug notes that serve as good reference later when we need to review why a particular decision was made).
For automated testing to be practical, of course, requires that the
system be designed to be tested in that way in the first place --
which unfortunately very few programmers have been trained to do.
"Whaddya mean, make my code modular and independently testable? I've
a deadline by 12am tonight, and I don't have time for that! Just
hardcode the data into the global variables and get the product out
the door before the midnight bell strikes; who cares if this thing is
testable, as long as the customer thinks it looks like it works!"
My experience tells me that this pay off in matter of days. Days as in less than a week. Doing the hacky stuff feel like it is faster, but measurement says otherwise.
Days? It pays off in *minutes* IME. When I first started using unittest blocks in D, the quality of my code improved *instantly*. Nasty bugs (caused by careless mistakes) were caught immediately rather than the next day after ad hoc manual testing (that also misses 15 other bugs that automated testing would've caught). This is the point I was trying to get at: manual testing is tedious, error-prone, because humans are no good at repetitive processes. It's too boring, and causes us to take shortcuts, thus missing out on retesting critical bits of code that may just happen to have acquired bugs since the last code change. But you *need* repetitive testing to ensure the new code didn't break the old, so some kind of unittesting framework is mandatory. Otherwise tons of bugs get introduced silently and bite you at the most inopportune time (like when a major customer just deployed it in their production environment). D's unittests may have their warts, but the fact that they are (1) written in D, and thus encourage copious tests and *up-to-date* tests, (2) are automated when compiling with -unittest (which I'd recommend to be a default flag during development), singlehandedly addresses the major points of automated testing already. I've seen codebases where unittests were in a pariah class of "run it if you dare, don't pay attention to the failures 'cos we think they're irrelevant, 'cos the test cases are outdated", or "that's QA's job, it's not our department". Totally defeats the purpose. Tests should be (1) automatically run *by default* during development, and (2) kept up-to-date. Point (2) is especially hard when the unittesting framework isn't built into the language, because nobody wants to shift gears to write tests when they could be "more productive" cranking out code (or at least, that's the perception). The result is that the tests are outdated, and the programmers stop paying attention to failing tests just like they ignore compiler warnings. D does it right for both points, even if people complain about issues with selective testing, etc.. T -- The fact that anyone still uses AOL shows that even the presence of options doesn't stop some people from picking the pessimal one. - Mike Ellis
Oct 30 2013
parent "Wyatt" <wyatt.epp gmail.com> writes:
On Thursday, 31 October 2013 at 03:27:23 UTC, H. S. Teoh wrote:
 Wow. No bug tracker?? That's just insane. How do they keep 
 track of anything??
That describes my day job. To answer: we kind of...don't. ¬_¬ I'm in legacy maintenance too, so the lack of documentation of even known issues is incredibly frustrating. I'm trying to change that, but there's a lot of inertia from the people who've been around for 20+ years. Forget testing; just figuring out the maintainer for a tree is an adventure. -Wyatt
Oct 31 2013
prev sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, October 29, 2013 19:14:50 Walter Bright wrote:

 3. Beating out of engineers the hubris that "this part I designed will never
 fail!" Jeepers, how often I've heard that.
It makes me think of a manager where I work who was happy that one of the projects had no bugs reported on it by the testers, whereas we thought that it was horrible. We _knew_ that there were bugs (there's no way that they're weren't), but they weren't being reported. So, we thought that the lack of bug reports was a horrible sign, whereas he thought that it meant that the product was in good shape. Going to the extreme of assuming that something that you wrote won't fail is even worse. I don't trust even the stuff that I tested to death to be bug-free, and that's not even taking into account the possibility of the assumptions that it's using falling apart for some reason (e.g. the underlying system calls ceasing to function properly for some reason) or hardware failures (which will happen eventually). No program will run forever or perfectly (especially one of any real complexity), and no hardware will never die. That's a given, and it's sad to see a trained engineer thinking otherwise. - Jonathan M Davis
Oct 30 2013
parent "Colin Grogan" <grogan.colin gmail.com> writes:
On Thursday, 31 October 2013 at 04:24:42 UTC, Jonathan M Davis 
wrote:
 On Tuesday, October 29, 2013 19:14:50 Walter Bright wrote:

 That's a given, and it's sad to see a trained engineer thinking 
 otherwise.

 - Jonathan M Davis
I'd begin to question the value of that "training" :)
Oct 31 2013
prev sibling parent reply "Chris" <wendlec tcd.ie> writes:
On Tuesday, 29 October 2013 at 21:39:59 UTC, Walter Bright wrote:
 On 10/29/2013 2:38 PM, Walter Bright wrote:
 I wrote one for DDJ a few years back, "Safe Systems from 
 Unreliable Parts". It's
 probably scrolled off their system.
http://www.drdobbs.com/architecture-and-design/safe-systems-from-unreliable-parts/228701716
Good man yourself! I still can't get my head around the fact that companies fail to provide safety switches that either hand over the control (to humans) or at least disable the software based components completely by switching the machine off. I always try to convince people (who don't program themselves) that they shouldn't trust software, especially when it comes to safety. Well, it seems like your old Dodge (?) is still the safest option.
Oct 29 2013
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/29/2013 3:20 PM, Chris wrote:
 Well, it seems like your old Dodge (?) is still the safest option.
:-)
Oct 29 2013
prev sibling next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 29/10/13 23:20, Chris wrote:
 Good man yourself! I still can't get my head around the fact that companies
fail
 to provide safety switches that either hand over the control (to humans) or at
 least disable the software based components completely by switching the machine
 off.
All too often, the reason why management decides to use software to perform tasks is because they don't trust their employees to do anything. It's a mystery to me why they don't start by finding employees they _do_ trust ... :-)
Oct 29 2013
parent "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 30 October 2013 at 00:16:10 UTC, Joseph Rushton 
Wakeling wrote:
 On 29/10/13 23:20, Chris wrote:
 Good man yourself! I still can't get my head around the fact 
 that companies fail
 to provide safety switches that either hand over the control 
 (to humans) or at
 least disable the software based components completely by 
 switching the machine
 off.
All too often, the reason why management decides to use software to perform tasks is because they don't trust their employees to do anything. It's a mystery to me why they don't start by finding employees they _do_ trust ... :-)
These are expensive, and you got to treat them well!
Oct 30 2013
prev sibling next sibling parent reply Brad Roberts <braddr puremagic.com> writes:
On 10/29/13 5:15 PM, Joseph Rushton Wakeling wrote:
 On 29/10/13 23:20, Chris wrote:
 Good man yourself! I still can't get my head around the fact that companies
fail
 to provide safety switches that either hand over the control (to humans) or at
 least disable the software based components completely by switching the machine
 off.
All too often, the reason why management decides to use software to perform tasks is because they don't trust their employees to do anything. It's a mystery to me why they don't start by finding employees they _do_ trust ... :-)
As long as you're relying on trust, you're in trouble. Trust and verify. Of course, you have to trust the verification, but that trust can in turn be validated (harder to falsify stress to failure results than "yeah, it'll work" assertsions). It's part of why testing exists.
Oct 29 2013
parent "Joseph Rushton Wakeling" <joseph.wakeling webdrake.net> writes:
On Wednesday, 30 October 2013 at 00:28:28 UTC, Brad Roberts wrote:
 As long as you're relying on trust, you're in trouble.  Trust 
 and verify.  Of course, you have to trust the verification, but 
 that trust can in turn be validated (harder to falsify stress 
 to failure results than "yeah, it'll work" assertsions).  It's 
 part of why testing exists.
Of course -- in fact, verification serves to enhance and sustain trust, the two are complementary. But not relying on blind trust doesn't make it any less daft to employ people you don't have trust in.
Oct 29 2013
prev sibling parent reply "Joakim" <joakim airpost.net> writes:
On Tuesday, 29 October 2013 at 22:20:08 UTC, Chris wrote:
 On Tuesday, 29 October 2013 at 21:39:59 UTC, Walter Bright 
 wrote:
 On 10/29/2013 2:38 PM, Walter Bright wrote:
 I wrote one for DDJ a few years back, "Safe Systems from 
 Unreliable Parts". It's
 probably scrolled off their system.
http://www.drdobbs.com/architecture-and-design/safe-systems-from-unreliable-parts/228701716
Good man yourself! I still can't get my head around the fact that companies fail to provide safety switches that either hand over the control (to humans) or at least disable the software based components completely by switching the machine off.
Heh, this reminded me of my current ultrabook, the Zenbook Prime UX31A, which is an absolutely fantastic machine, the best I've ever owned, but whose designers made the unfortunate decision to make the power button just another key on the keyboard, as opposed to hard-wiring it directly to the battery. Combine that with the fact that the keyboard connector doesn't hold its place well and is actually held in place by masking tape: http://www.ifixit.com/Guide/Unresponsive+Keyboard+Keys/11932 Cut to me late last year, unable to turn my ultrabook on because the keyboard connector had completely slipped out, a month after I had accidentally dropped it. I had to find the linked instructions after a bunch of googling, go pick up a Torx T5, and fix it myself, as Asus support kept insisting to everyone that it was a software issue and that they should either reinstall the drivers or the OS! I followed those simple instructions instead and no problems till a week ago, when I had to repeat the procedure again. :)
Oct 29 2013
parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 10/29/2013 8:59 PM, Joakim wrote:> Heh, this reminded me of my 
current ultrabook, the Zenbook Prime UX31A,
 which is an absolutely fantastic machine, the best I've ever owned, but
 whose designers made the unfortunate decision to make the power button
 just another key on the keyboard, as opposed to hard-wiring it directly
 to the battery.  Combine that with the fact that the keyboard connector
 doesn't hold its place well and is actually held in place by masking 
tape:
 http://www.ifixit.com/Guide/Unresponsive+Keyboard+Keys/11932

 Cut to me late last year, unable to turn my ultrabook on because the
 keyboard connector had completely slipped out, a month after I had
 accidentally dropped it.  I had to find the linked instructions after a
 bunch of googling, go pick up a Torx T5, and fix it myself, as Asus
 support kept insisting to everyone that it was a software issue and that
 they should either reinstall the drivers or the OS!  I followed those
 simple instructions instead and no problems till a week ago, when I had
 to repeat the procedure again. :)
I'm still irritated that laptop manufacturers have gone the cheap route of replacing physical "disconnect the wireless antennas" switch with software-based keyboard combinations. Actually, much more than that, I'm *really* annoyed at the elimination of physical, hardware-based speaker volume controls in favor of purely-software volume controls that do whatever the hell they want, whenever they feel like it, and don't even work *at all* under many basic circumstances (A hardware volume works *even when the device is off*. Try writing an app to do THAT!): http://semitwist.com/articles/article/view/it-takes-a-special-kind-of-stupid-to-screw-up-volume-controls Another example of the worthlessness of software volumes is the stereo in my mom's 2010 Hyundai Elantra: Every time the car is turned on the radio comes on, and at a factory-determined volume, *regardless* of how you left the volume and on/off state when you last turned the car off. And then if you immediately turn the radio back off, it will *automatically turn it back on AGAIN*. Stupid motherfuckers actually claimed this was a "convenience feature". Idiocy at its finest. Hardware controls *CAN'T* fuck things up that freaking badly. But software opens the door to all manner of colossally bad design blunders. By contrast, I *love* my Prizm's stereo. EVerything always does exactly what I tell it. I don't have to turn the car on and let some OS boot before I can turn the volume down. I can *feel* all the buttons and use them without taking my eyes off the road. And unlike the Elantra's stereo, it's never crashed back to a Windows CE desktop.
Oct 31 2013
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Oct 29, 2013 at 02:38:38PM -0700, Walter Bright wrote:
 On 10/29/2013 2:22 PM, qznc wrote:
[...]
Maybe you should write an article about "Failsafe Design Principles"?
Some quick googleing did not turn up anything useful. Only horror
stories and anti-examples.
I wrote one for DDJ a few years back, "Safe Systems from Unreliable Parts". It's probably scrolled off their system.
It's the first google result when searching for the title: http://www.drdobbs.com/architecture-and-design/safe-systems-from-unreliable-parts/228701716 T -- Freedom of speech: the whole world has no right *not* to hear my spouting off!
Oct 29 2013
parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 10/29/2013 6:02 PM, H. S. Teoh wrote:
 On Tue, Oct 29, 2013 at 02:38:38PM -0700, Walter Bright wrote:
 On 10/29/2013 2:22 PM, qznc wrote:
[...]
 Maybe you should write an article about "Failsafe Design Principles"?
 Some quick googleing did not turn up anything useful. Only horror
 stories and anti-examples.
I wrote one for DDJ a few years back, "Safe Systems from Unreliable Parts". It's probably scrolled off their system.
It's the first google result when searching for the title: http://www.drdobbs.com/architecture-and-design/safe-systems-from-unreliable-parts/228701716
Google has no such thing as a first result for a given search string. Hasn't for a loooong time. Better to use startpage.com
Oct 31 2013
prev sibling parent reply Russel Winder <russel winder.org.uk> writes:
On Tue, 2013-10-29 at 14:38 -0700, Walter Bright wrote:
[…]
 I wrote one for DDJ a few years back, "Safe Systems from Unreliable Parts".
It's 
 probably scrolled off their system.
Update it and republish somewhere. Remember the cool hipsters think if it is over a year old it doesn't exist. And the rest of us could always do with a good reminder of quality principles. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Oct 30 2013
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/30/2013 3:30 AM, Russel Winder wrote:
 On Tue, 2013-10-29 at 14:38 -0700, Walter Bright wrote:
 […]
 I wrote one for DDJ a few years back, "Safe Systems from Unreliable Parts".
It's
 probably scrolled off their system.
Update it and republish somewhere. Remember the cool hipsters think if it is over a year old it doesn't exist. And the rest of us could always do with a good reminder of quality principles.
Good idea. Maybe I should do a followup DDJ article based on the Toyota report.
Oct 30 2013
prev sibling parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 10/30/2013 6:30 AM, Russel Winder wrote:
 On Tue, 2013-10-29 at 14:38 -0700, Walter Bright wrote:
 […]
 I wrote one for DDJ a few years back, "Safe Systems from Unreliable Parts".
It's
 probably scrolled off their system.
Update it and republish somewhere. Remember the cool hipsters think if it is over a year old it doesn't exist. And the rest of us could always do with a good reminder of quality principles.
The cool hipsters are never going to accept that reliability is more important than their own personal time. Hell, they refuse to even accept that their user's time is more important than their own. They'll never go for safety even if you cram it down their throats. Better to just drum them out of the industry. Or better yet, out of society as a whole. But a repost of the article is a good idea :)
Oct 31 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Take a look at the reddit thread on this:

http://www.reddit.com/r/programming/comments/1pgyaa/toyotas_killer_firmware_bad_design_and_its/

Do a search for "failsafe". Sigh.
Oct 29 2013
next sibling parent reply "Chris" <wendlec tcd.ie> writes:
On Wednesday, 30 October 2013 at 03:24:54 UTC, Walter Bright 
wrote:
 Take a look at the reddit thread on this:

 http://www.reddit.com/r/programming/comments/1pgyaa/toyotas_killer_firmware_bad_design_and_its/

 Do a search for "failsafe". Sigh.
One of the comments under the original article you posted says "Poorly designed firmware caused unintended operation, lack of driver training made it fatal." So it's the driver's fault, who couldn't possibly know what was going on in that car-gone-mad? To put the blame on the driver is cynicism of the worst kind. Unfortunately, that's a common (and dangerous) attitude I've come across among programmers and engineers. The user has to adapt to anything they fail to implement or didn't think of. However, machines have to adapt to humans not the other way around (realizing this was part of Apple's success in UI design, Ubuntu is very good now too). I warmly recommend the book "Architect or Bee": http://www.amazon.com/Architect-Bee-Human-Technology-Relationship/dp/0896081311/ref=sr_1_1?ie=UTF8&qid=1383127030&sr=8-1&keywords=architect+or+bee
Oct 30 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/30/2013 3:01 AM, Chris wrote:
 On Wednesday, 30 October 2013 at 03:24:54 UTC, Walter Bright wrote:
 Take a look at the reddit thread on this:

 http://www.reddit.com/r/programming/comments/1pgyaa/toyotas_killer_firmware_bad_design_and_its/


 Do a search for "failsafe". Sigh.
One of the comments under the original article you posted says "Poorly designed firmware caused unintended operation, lack of driver training made it fatal." So it's the driver's fault, who couldn't possibly know what was going on in that car-gone-mad? To put the blame on the driver is cynicism of the worst kind.
Much effort in cockpit design goes into trying to figure out what the pilot would do "intuitively" and ensuring that that is the right thing to do. Of course, we try to do that with programming language design, too, with varying degrees of success.
 Unfortunately, that's a common (and dangerous) attitude I've come across among
 programmers and engineers. The user has to adapt to anything they fail to
 implement or didn't think of. However, machines have to adapt to humans not the
 other way around (realizing this was part of Apple's success in UI design,
 Ubuntu is very good now too).

 I warmly recommend the book "Architect or Bee":

 http://www.amazon.com/Architect-Bee-Human-Technology-Relationship/dp/0896081311/ref=sr_1_1?ie=UTF8&qid=1383127030&sr=8-1&keywords=architect+or+bee
Oct 30 2013
next sibling parent reply Russel Winder <russel winder.org.uk> writes:
On Wed, 2013-10-30 at 11:12 -0700, Walter Bright wrote:
[…]
 Much effort in cockpit design goes into trying to figure out what the pilot 
 would do "intuitively" and ensuring that that is the right thing to do.
I've no experience with cockpit design, but I am aware of all the HCI work that went into air traffic control in the 1980s and 1990s, especially realizing the safety protocols which are socio-political systems as much as computer realized things. This sort of safety work is as much about the context and the human actors as much as the computer and software.
 Of course, we try to do that with programming language design, too, with
varying 
 degrees of success.
[…] Has any programming language ever had psychology of programming folk involved from the outset rather than after the fact as a "patch up" activity? -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Oct 30 2013
next sibling parent "Chris" <wendlec tcd.ie> writes:
On Wednesday, 30 October 2013 at 18:35:44 UTC, Russel Winder 
wrote:
 On Wed, 2013-10-30 at 11:12 -0700, Walter Bright wrote:
 […]
 Much effort in cockpit design goes into trying to figure out 
 what the pilot would do "intuitively" and ensuring that that 
 is the right thing to do.
I've no experience with cockpit design, but I am aware of all the HCI work that went into air traffic control in the 1980s and 1990s, especially realizing the safety protocols which are socio-political systems as much as computer realized things. This sort of safety work is as much about the context and the human actors as much as the computer and software.
 Of course, we try to do that with programming language design, 
 too, with varying degrees of success.
[…] Has any programming language ever had psychology of programming folk involved from the outset rather than after the fact as a "patch up" activity?
Ruby, they say. Even if it's only one programmer they based it on. :-)
Oct 30 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 30 October 2013 at 18:35:44 UTC, Russel Winder 
wrote:
 Has any programming language ever had psychology of programming 
 folk
 involved from the outset rather than after the fact as a "patch 
 up"
 activity?
Yes, sadly I can't remember the name. Very insightful project. Probably won't be successful in itself, but a lot have to be learned from the experiment.
Oct 30 2013
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/30/2013 11:35 AM, Russel Winder wrote:
 Has any programming language ever had psychology of programming folk
 involved from the outset rather than after the fact as a "patch up"
 activity?
I think they all have. The "patch up" activity comes from discovering that they were wrong :-) One of my favorite anecdotes comes from the standardized jargon used in aviation. When you are ready to take off, you throttle up to max power first. Hence, the standard jargon for firewalling the throttles is "takeoff power". This lasted until an incident where the pilot, coming in for a landing, realized he had to abort the landing and go around. He yelled "takeoff power", and the copilot promptly powered down the engines, causing the plane to stall and crash. "take off power", get it? The standard phrase was then changed to "full power" or "maximum power", I forgot which. This all seems so so obvious in hindsight, doesn't it? But the best minds didn't see it until after there was an accident. This is all too common.
Oct 30 2013
prev sibling parent reply "Adam Wilson" <flyboynw gmail.com> writes:
On Wed, 30 Oct 2013 11:12:48 -0700, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 10/30/2013 3:01 AM, Chris wrote:
 On Wednesday, 30 October 2013 at 03:24:54 UTC, Walter Bright wrote:
 Take a look at the reddit thread on this:

 http://www.reddit.com/r/programming/comments/1pgyaa/toyotas_killer_firmware_bad_design_and_its/


 Do a search for "failsafe". Sigh.
One of the comments under the original article you posted says "Poorly designed firmware caused unintended operation, lack of driver training made it fatal." So it's the driver's fault, who couldn't possibly know what was going on in that car-gone-mad? To put the blame on the driver is cynicism of the worst kind.
Much effort in cockpit design goes into trying to figure out what the pilot would do "intuitively" and ensuring that that is the right thing to do. Of course, we try to do that with programming language design, too, with varying degrees of success.
 Unfortunately, that's a common (and dangerous) attitude I've come  
 across among
 programmers and engineers. The user has to adapt to anything they fail  
 to
 implement or didn't think of. However, machines have to adapt to humans  
 not the
 other way around (realizing this was part of Apple's success in UI  
 design,
 Ubuntu is very good now too).

 I warmly recommend the book "Architect or Bee":

 http://www.amazon.com/Architect-Bee-Human-Technology-Relationship/dp/0896081311/ref=sr_1_1?ie=UTF8&qid=1383127030&sr=8-1&keywords=architect+or+bee
Having experience with a 737 flight deck and Cessna 172/G1000 flight deck. I can personally say that if even one of the devs on both of those (very different) flight information systems had a clue about HCI he was physically beaten for bringing it up. Yes, the absolute fundamentals might be intuitive (AI, DG, etc,). But if you need anything advanced ... God Help You. I did eventually figure it out (and started helping the instructors at my FBO), but intuitive is NOT the word I would use... There is also a story floating around about the boys (I'll not deign to call the programmers...) at Honeywell FINALLY called in a group of pilots for HCI analysis/critique of the 787 flight management systems months after they had shipped the code to the FAA for certification... And lastly, although it got buried because France needs to protect EADS, there was a "By Design" bug that caused the Angle of Attack indicator to NOT show when AF447 was in deep stall, overridden by the faulty airspeed indication, never mind that this is the ONLY indicator a pilot needs to recover from a stall... If the pilots had seen this when the plane went into it's unusual attitude, the pilots could've seen it and corrected immediately. Sorry Airbus, but the computer does NOT always know best, it's only as good as the [non-pilot] programmers feeding it code... :-) -- Adam Wilson IRC: LightBender Project Coordinator The Horizon Project http://www.thehorizonproject.org/
Oct 30 2013
parent "Chris" <wendlec tcd.ie> writes:
On Thursday, 31 October 2013 at 06:32:41 UTC, Adam Wilson wrote:
 Having experience with a 737 flight deck and Cessna 172/G1000 
 flight deck. I can personally say that if even one of the devs 
 on both of those (very different) flight information systems 
 had a clue about HCI he was physically beaten for bringing it 
 up. Yes, the absolute fundamentals might be intuitive (AI, DG, 
 etc,). But if you need anything advanced ... God Help You. I 
 did eventually figure it out (and started helping the 
 instructors at my FBO), but intuitive is NOT the word I would 
 use...

 There is also a story floating around about the boys (I'll not 
 deign to call the programmers...) at Honeywell FINALLY called 
 in a group of pilots for HCI analysis/critique of the 787 
 flight management systems months after they had shipped the 
 code to the FAA for certification...

 And lastly, although it got buried because France needs to 
 protect EADS, there was a "By Design" bug that caused the Angle 
 of Attack indicator to NOT show when AF447 was in deep stall, 
 overridden by the faulty airspeed indication, never mind that 
 this is the ONLY indicator a pilot needs to recover from a 
 stall... If the pilots had seen this when the plane went into 
 it's unusual attitude, the pilots could've seen it and 
 corrected immediately.
 Sorry Airbus, but the computer does NOT always know best, it's 
 only as good as the [non-pilot] programmers feeding it code... 
 :-)
I'm still waiting for the day when people will realize this! I always hear users say "Ah, it's been calculated by a computer! It must be correct.", assuming that machines are perfect. I always ask the question "But who builds and programs machines?" Humans, of course. And we are not perfect, far from it. Another story I've heard is that the German revenue had a clever program that could find out, if a shop or pub owner was cheating. The program assumed that if a certain threshold of round numbers (in the colloquial sense of the word) in his/her balances was exceeded the business owner was cheating. Now, there was one pub owner who only had prices with round numbers (and I know others too), simply because he couldn't be bothered to deal with stupid prices like €4.27 and to always have the right change. This is not uncommon. The programmers of the revenue had based their stats on all retailers of the country, including supermarkets, department stores etc.
Oct 31 2013
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 10/30/2013 11:01 AM, Chris wrote:
 "Poorly designed firmware caused unintended operation, lack of driver
  training made it fatal."
 So it's the driver's fault, who couldn't possibly know what was going on
 in that car-gone-mad? To put the blame on the driver is cynicism of the worst
kind.
 Unfortunately, that's a common (and dangerous) attitude I've come across
 among programmers and engineers.
There are also misguided end users who believe there cannot be any other way (and sometimes even believe that the big players in the industry are infallible, and hence the user is to blame for any failure).
 The user has to adapt to anything they
 fail to implement or didn't think of. However, machines have to adapt to
 humans not the other way around (realizing this was part of Apple's
 success in UI design,
AFAIK Apple designs are not meant to be adapted. It seems to be mostly marketing.
 Ubuntu is very good now too).
The distribution is not really indicative of the UI/window manager you'll end up using, so what do you mean?
Oct 30 2013
next sibling parent "Chris" <wendlec tcd.ie> writes:
On Wednesday, 30 October 2013 at 21:18:16 UTC, Timon Gehr wrote:
 On 10/30/2013 11:01 AM, Chris wrote:
 "Poorly designed firmware caused unintended operation, lack of 
 driver
 training made it fatal."
 So it's the driver's fault, who couldn't possibly know what 
 was going on
 in that car-gone-mad? To put the blame on the driver is 
 cynicism of the worst kind.
 Unfortunately, that's a common (and dangerous) attitude I've 
 come across
 among programmers and engineers.
There are also misguided end users who believe there cannot be any other way (and sometimes even believe that the big players in the industry are infallible, and hence the user is to blame for any failure).
 The user has to adapt to anything they
 fail to implement or didn't think of. However, machines have 
 to adapt to
 humans not the other way around (realizing this was part of 
 Apple's
 success in UI design,
AFAIK Apple designs are not meant to be adapted. It seems to be mostly marketing.
Forget about the marketing campaigns for a moment. Xerox (back in the day) started to develop GUIs (as we know them). The developers later went to Apple. Apple was one of the first companies to go for user experience and try to design things in a more intuitive way, i.e. how humans work and think, what they expect (I know the command line crowd hates GUIs). I'd say that Windows is at the other end of the scale. Try to find info "about this computer" on a Mac and try to find it in (each new version) of Windows. Don't forget that you shut Windows down where it says "Start". Ha ha ha! That said, Apple is going down the wrong way now too, IMO. 10.8 is just annoying in many ways. Too intrusive, too patronising, too much like a prison.
 Ubuntu is very good now too).
The distribution is not really indicative of the UI/window manager you'll end up using, so what do you mean?
Ubuntu is quite good now UI wise. I recently had a user that found everything almost immediately, although she had used Ubuntu before nor does she know much about computers, nor does she like computers. That's what I mean. Intuitive, i.e. the computer is arranged in the way the human mind works. Things are easy to find and use.
Oct 30 2013
prev sibling next sibling parent reply "Chris" <wendlec tcd.ie> writes:
On Wednesday, 30 October 2013 at 21:18:16 UTC, Timon Gehr wrote:
 On 10/30/2013 11:01 AM, Chris wrote:
 "Poorly designed firmware caused unintended operation, lack of 
 driver
 training made it fatal."
 So it's the driver's fault, who couldn't possibly know what 
 was going on
 in that car-gone-mad? To put the blame on the driver is 
 cynicism of the worst kind.
 Unfortunately, that's a common (and dangerous) attitude I've 
 come across
 among programmers and engineers.
There are also misguided end users who believe there cannot be any other way (and sometimes even believe that the big players in the industry are infallible, and hence the user is to blame for any failure).
I know. A lot of people are like that. But who (mis)guides them? The big PR campaigns by big companies who talk about "safety" and "precision" and give users a false sense of security. Now that I think of it, maybe the fact that they don't have a simple mechanical backup is not because of the engineering culture. Maybe it is to do with the fact that a product might seem less attractive, if the company admits that it can fail by including a backup mechanism.
Oct 30 2013
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 30/10/13 23:31, Chris wrote:
 I know. A lot of people are like that. But who (mis)guides them? The big PR
 campaigns by big companies who talk about "safety" and "precision" and give
 users a false sense of security. Now that I think of it, maybe the fact that
 they don't have a simple mechanical backup is not because of the engineering
 culture. Maybe it is to do with the fact that a product might seem less
 attractive, if the company admits that it can fail by including a backup
mechanism.
I'll play devil's advocate here, if nothing else because I'm curious what Walter's response may be ... :-) One of the things that makes a car different from an aeroplane is that pilots form a relatively small group of highly-trained people. Car drivers get trained, but not to a very high level. So, in those circumstances, any control you put in the vehicle needs to be confronted with at least four questions -- "What are the expected benefits if this control needs to be used and is used correctly?" "What are the expected problems if this control doesn't need to be used, but is used anyway?" "What's the likelihood of a situation arising where the control needs to be used?" "What's the likelihood that the driver can correctly distinguish when it needs to be used -- what are the expected false positives and false negatives?" The point being that a manual override in the hands of the average driver could in fact _increase_ the risk of an accident because the most typical outcome is a driver engaging it incorrectly.
Oct 31 2013
parent reply "Chris" <wendlec tcd.ie> writes:
On Thursday, 31 October 2013 at 12:32:48 UTC, Joseph Rushton 
Wakeling wrote:


 The point being that a manual override in the hands of the 
 average driver could in fact _increase_ the risk of an accident 
 because the most typical outcome is a driver engaging it 
 incorrectly.
I wonder how people could drive 25 years ago!? No software on board. Gosh they must have been geniuses! ;)
Oct 31 2013
next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 31 October 2013 at 12:49:23 UTC, Chris wrote:
 On Thursday, 31 October 2013 at 12:32:48 UTC, Joseph Rushton 
 Wakeling wrote:


 The point being that a manual override in the hands of the 
 average driver could in fact _increase_ the risk of an 
 accident because the most typical outcome is a driver engaging 
 it incorrectly.
I wonder how people could drive 25 years ago!? No software on board. Gosh they must have been geniuses! ;)
They crashed into things. Now, with ABS or path correction, they do so less.
Oct 31 2013
prev sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 31/10/13 13:49, Chris wrote:
 I wonder how people could drive 25 years ago!? No software on board. Gosh they
 must have been geniuses! ;)
"At greater risk of an accident" != "Incapable of driving" ;-)
Oct 31 2013
parent reply "Chris" <wendlec tcd.ie> writes:
On Thursday, 31 October 2013 at 13:32:19 UTC, Joseph Rushton 
Wakeling wrote:
 On 31/10/13 13:49, Chris wrote:
 I wonder how people could drive 25 years ago!? No software on 
 board. Gosh they
 must have been geniuses! ;)
"At greater risk of an accident" != "Incapable of driving" ;-)
Fair enough. Well, I was once driving the companies BMW and the ABS etc. saved me when I came to a road that was iced over. I had no experience at all with the car. However, if BMW didn't have that stupid rear-wheel drive, I would have been fine anyway. Front-wheel drive is much safer, especially in wet and icy conditions. The danger is that people overestimate the power of these technologies and rely too much on them (which leads to dangerous situations on the road). I'm glad I learned how to drive when cars were still cars and not little space ships.
Oct 31 2013
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Oct 31, 2013 at 03:26:31PM +0100, Chris wrote:
 On Thursday, 31 October 2013 at 13:32:19 UTC, Joseph Rushton
 Wakeling wrote:
On 31/10/13 13:49, Chris wrote:
I wonder how people could drive 25 years ago!? No software on board.
Gosh they must have been geniuses! ;)
"At greater risk of an accident" != "Incapable of driving" ;-)
Fair enough. Well, I was once driving the companies BMW and the ABS etc. saved me when I came to a road that was iced over. I had no experience at all with the car. However, if BMW didn't have that stupid rear-wheel drive, I would have been fine anyway. Front-wheel drive is much safer, especially in wet and icy conditions. The danger is that people overestimate the power of these technologies and rely too much on them (which leads to dangerous situations on the road). I'm glad I learned how to drive when cars were still cars and not little space ships.
ABS is certainly a helpful thing, especially in inclement conditions like snow/ice. But it's far from perfect. Once, I was going downhill on an icy road and suddenly started to skid dangerously close to the car in front of me. The ABS kicked in when I slammed the brakes, but it couldn't regain traction on the ice. At the last moment, I manually pumped the brakes and managed to come to a shaky stop inches before I hit the car in front. You don't know how thankful I am for having learnt the concept of pumping the brakes, ABS or not. I'm afraid too many driving instructors nowadays just advise slamming the brakes and relying on the ABS to do the job. It doesn't *always* work! T -- Nobody is perfect. I am Nobody. -- pepoluan, GKC forum
Oct 31 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/31/2013 7:57 AM, H. S. Teoh wrote:
 You don't know how thankful I am for having learnt the concept of
 pumping the brakes, ABS or not. I'm afraid too many driving instructors
 nowadays just advise slamming the brakes and relying on the ABS to do
 the job. It doesn't *always* work!
Pumping the brakes is not how to get the max braking effect. The way to do it is to push on the pedal to about 70-80% of braking force. This causes the car to push its weight onto the front tires and load them up. Then go 100%. You'll stop a lot faster, because with more weight on the front tires they have more grip. (I think this is called 2 step braking.) You lose about 30% of braking force when the tires break loose. The trick is to press the pedal just short of that happening, which can be found with a bit of practice. The downside of just slamming the brakes on and letting the ABS take care of it is you lose the 2-step effect. There are also cases where you *want* to lock the tires. That case is when you're in a skid and the car is at a large angle relative to its velocity vector. This will cause the car to slide in a straight line, meaning that other cars can avoid you. If you don't lock the wheels, the wheels can arbitrarily "grab" and shoot the car off in an unexpected direction - like over the embankment, or into the car that was dodging you. The car will also stop faster than if the wheels suddenly grab when you're at a 30 degree angle. But yeah, I'd guess less than 1% of drivers know this stuff. And even if you know it, you have to practice it now and then to be proficient at it.
Oct 31 2013
next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 31 October 2013 at 19:45:17 UTC, Walter Bright wrote:
 But yeah, I'd guess less than 1% of drivers know this stuff. 
 And even if you know it, you have to practice it now and then 
 to be proficient at it.
On a lot of cars, there's a hidden switch to turn off ABS. It's recommended (AFAIK) to do so if you expect you'll be driving on an unsound surface (ice, snow, pebbles...)
Oct 31 2013
prev sibling parent reply "Chris" <wendlec tcd.ie> writes:
On Thursday, 31 October 2013 at 19:45:17 UTC, Walter Bright wrote:
 On 10/31/2013 7:57 AM, H. S. Teoh wrote:
 You don't know how thankful I am for having learnt the concept 
 of
 pumping the brakes, ABS or not. I'm afraid too many driving 
 instructors
 nowadays just advise slamming the brakes and relying on the 
 ABS to do
 the job. It doesn't *always* work!
Pumping the brakes is not how to get the max braking effect. The way to do it is to push on the pedal to about 70-80% of braking force. This causes the car to push its weight onto the front tires and load them up. Then go 100%. You'll stop a lot faster, because with more weight on the front tires they have more grip. (I think this is called 2 step braking.) You lose about 30% of braking force when the tires break loose. The trick is to press the pedal just short of that happening, which can be found with a bit of practice. The downside of just slamming the brakes on and letting the ABS take care of it is you lose the 2-step effect. There are also cases where you *want* to lock the tires. That case is when you're in a skid and the car is at a large angle relative to its velocity vector. This will cause the car to slide in a straight line, meaning that other cars can avoid you. If you don't lock the wheels, the wheels can arbitrarily "grab" and shoot the car off in an unexpected direction - like over the embankment, or into the car that was dodging you. The car will also stop faster than if the wheels suddenly grab when you're at a 30 degree angle. But yeah, I'd guess less than 1% of drivers know this stuff. And even if you know it, you have to practice it now and then to be proficient at it.
I still think that, although software can help to make things safer, common sense should not be turned off while driving. If it's raining heavily, slow down. Sounds simple, but many drivers don't do it, with or without software.
Nov 01 2013
parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 11/1/2013 8:47 AM, Chris wrote:
 I still think that, although software can help to make things safer,
 common sense should not be turned off while driving. If it's raining
 heavily, slow down. Sounds simple, but many drivers don't do it, with or
 without software.
I've noticed that the worse the driving conditions are, the more tailgating jackasses I have to put up with.
Nov 02 2013
prev sibling parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 10/30/2013 5:18 PM, Timon Gehr wrote:
 On 10/30/2013 11:01 AM, Chris wrote:
 "Poorly designed firmware caused unintended operation, lack of driver
  training made it fatal."
 So it's the driver's fault, who couldn't possibly know what was going on
 in that car-gone-mad? To put the blame on the driver is cynicism of
 the worst kind.
 Unfortunately, that's a common (and dangerous) attitude I've come across
 among programmers and engineers.
There are also misguided end users who believe there cannot be any other way (and sometimes even believe that the big players in the industry are infallible, and hence the user is to blame for any failure).
I have a deep hatred for such people. (I've come across far too many.)
 The user has to adapt to anything they
 fail to implement or didn't think of. However, machines have to adapt to
 humans not the other way around (realizing this was part of Apple's
 success in UI design,
AFAIK Apple designs are not meant to be adapted. It seems to be mostly marketing.
This is very true (at least for Apple's "Return of Jobs" era). And it's not surprising: Steve Jobs had a notoriously heavy hand in Apple's designs and yet Jobs himself has never, realistically, been much of anything more than a glorified salesman. The company was literally being run by a salesman. And that easily explains both the popularity and the prevalence of bad design.
 Ubuntu is very good now too).
The distribution is not really indicative of the UI/window manager you'll end up using, so what do you mean?
Ordinarily, yes, but I would think there'd be an uncommonly strong correlation between Ubuntu users and Unity users.
Oct 31 2013
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
And the slashdot version:

http://tech.slashdot.org/story/13/10/29/208205/toyotas-killer-firmware
Oct 30 2013
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Tue, 29 Oct 2013 20:38:08 -0000, Walter Bright  
<newshound2 digitalmars.com> wrote:

 https://news.ycombinator.com/item?id=6636811

 I know that everyone is tired of hearing my airframe design stories, but  
 it's obvious to me that few engineers understand the principles of  
 failsafe design. This article makes that abundantly clear - and the  
 consequences of paying no attention to it.

 You can add in Fukishima and Deepwater Horizon as more costly examples  
 of ignorance of basic failsafe design principles.

 Yeah, I feel strongly about this.
One safety mechanism was all that saved North Carolina: www.youtube.com/watch?v=SHZAaGidUbg&t=2m58s R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Oct 31 2013
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Walter Bright:

 ...
Everyone who writes safety critical software should really avoid languages unable to detect integral overflows (at compile-time or run-time) in all normal numerical operations, and languages that have undefined operations in their basic semantics. So Ada language is OK, C and D are not OK for safety critical software. Bye, bearophile
Nov 01 2013
next sibling parent reply "eles" <eles eles.com> writes:
On Friday, 1 November 2013 at 15:03:47 UTC, bearophile wrote:
 Walter Bright:
 avoid languages unable to detect integral overflows (at 
 compile-time or run-time) in all normal numerical operations,
Yeah, after the scope() statement, this is the thing that I'd want the most in my C, at least during debugging and testing. At least for some variables that we assume should never overflow (and most of them are like that).
Nov 01 2013
parent "bearophile" <bearophileHUGS lycos.com> writes:
eles:

 Yeah, after the scope() statement, this is the thing that I'd 
 want the most in my C,
The latest versions of Clang are able to catch some integral overflows at run-time. It is of course only a small part of the whole amount of things needed to produce correct code, but for high integrity software it's useful. Bye, bearophile
Nov 01 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/1/2013 8:03 AM, bearophile wrote:
 Everyone who writes safety critical software should really avoid languages
 unable to detect integral overflows (at compile-time or run-time) in all normal
 numerical operations, and languages that have undefined operations in their
 basic semantics.

 So Ada language is OK, C and D are not OK for safety critical software.
I think you're missing the point. Improving the quality of the software is not the answer to making fail safe systems.
Nov 01 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Walter Bright:

 I think you're missing the point. Improving the quality of the 
 software is not the answer to making fail safe systems.
To make high integrity software you have to start with reliable tools, and then use the right testing methodologies, sometimes you have to write down proofs, then you have to add redundancy, to use the right politics in the firm that writes the software, etc. Improving the quality of the language is not enough, but it's useful. You have to face the reliability problem from all the sides at the same time. All subsystems can fail, but to to make a reliable system you don't start building your whole system using the less reliable sub-parts you can find in the market. You use "good" components and good strategies at all levels. Bye, bearophile
Nov 02 2013
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 11/02/2013 10:55 AM, bearophile wrote:
 I think you're missing the point. Improving the quality of the
 software is not the answer to making fail safe systems.
To make high integrity software you have to start with reliable tools, and then use the right testing methodologies, sometimes you have to write down proofs,
Well, if there is a formal proof of correctness, checking for overflow at runtime is as pointless as limiting oneself to a language without undefined behaviour in its basic semantics.
Nov 02 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Timon Gehr:

 Well, if there is a formal proof of correctness, checking for 
 overflow at runtime is as pointless as limiting oneself to a 
 language without undefined behaviour in its basic semantics.
Perhaps you remember the funny quote by Knuth: "Beware of bugs in the above code; I have only proved it correct, not tried it." Proof of correctness are usually limited to certain subsystems, because writing a proof for the whole system usually it too much hard, or takes too much time/money, or because you have to interface with modules written by subcontractors. Often there is an "outer world" you are not including in your proof. This is a common strategy for systems written in Spark-Ada, and this is done even by NASA (and indeed was the cause of one famous crash of a NASA probe). So you rely on the language and simpler means to assure the safety of interfaces between the sections of the program. Another purpose for the run-time tests is to guard against random flips of bits, caused by radiations, cosmic rays, interferences, heat noise, etc. Such run-time tests are present in the probes on Mars, because even space-hardened electronics sometimes errs (and relying on other back-up means is not enough, as I have explained in the precedent post). And generally in high integrity systems you don't want to use a language with undefined behaviour in its basic constructs because such language is harder to test for the compiler writer too. If you take a look at blogs that today discuss the semantics of C programs you see there are cases where the C standard is ambiguous and the GCC/Intel/Clang compilers behave differently. This is a fragile foundation you don't want to build high integrity software on. It took decades to write a certified C compiler. Bye, bearophile
Nov 02 2013
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 11/02/2013 01:56 PM, bearophile wrote:
 Timon Gehr:

 Well, if there is a formal proof of correctness, checking for overflow
 at runtime is as pointless as limiting oneself to a language without
 undefined behaviour in its basic semantics.
Perhaps you remember the funny quote by Knuth: "Beware of bugs in the above code; I have only proved it correct, not tried it." Proof of correctness are usually limited to certain subsystems, ...
As long as additional ad-hoc techniques for error avoidance are fruitful, the formal proof of correctness does not cover enough cases. (Of course, one may not be able to construct such a proof.)
 This is a common strategy for systems written in Spark-Ada, and this
 is done even by NASA (and indeed was the cause of one famous crash of a NASA
probe).
Well, I think it is funny to consider a methodology adequate in hindsight if it has resulted in a crash. Have the techniques advocated by Walter been thoroughly applied in this project?
 And generally in high integrity systems you don't want to use a language
 with undefined behaviour in its basic constructs because such language
 is harder to test for the compiler writer too. If you take a look at
 blogs that today discuss the semantics of C programs you see there are
 cases where the C standard is ambiguous and the GCC/Intel/Clang
 compilers behave differently.
None of those compilers is proven correct.
 This is a fragile foundation you don't want to build high integrity software
on.
Therefore, obviously.
 It took decades to write a certified C compiler.
No, it took four years. CompCert was started in 2005.
Nov 02 2013
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/2/2013 6:59 AM, Timon Gehr wrote:
 Well, I think it is funny to consider a methodology adequate in hindsight if it
 has resulted in a crash. Have the techniques advocated by Walter been
thoroughly
 applied in this project?
One downside of system redundancy is it adds weight, and spacecraft are catastrophically sensitive to weight. When space probes fail, they don't kill people. So while the failures cost money and are embarrassing, the weight penalty of redundancy may have meant the mission wasn't practical in the first place. Tradeoffs, tradeoffs. I don't know much about failsafe redundancy in, for example, Mars probes. I have seen discussions about the lack of failsafes in many aspects of the Shuttle design. They are well known tradeoffs, though, and they know the risks. Nobody has even figured out how to make failsafe helicopter rotor blades. Instead, they opt for expensive maintenance and inspections. If a rotor blade fails, the helicopter crashes and kills everyone aboard.
Nov 02 2013
prev sibling parent reply "qznc" <qznc web.de> writes:
On Saturday, 2 November 2013 at 13:59:53 UTC, Timon Gehr wrote:
 It took decades to write a certified C compiler.
No, it took four years. CompCert was started in 2005.
CompCert is verified, not certified. While it is impressive work, is anybody using it in production?
Nov 04 2013
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 11/04/2013 06:44 PM, qznc wrote:
 On Saturday, 2 November 2013 at 13:59:53 UTC, Timon Gehr wrote:
 It took decades to write a certified C compiler.
No, it took four years. CompCert was started in 2005.
CompCert is verified, not certified. ...
? http://ncatlab.org/nlab/show/certified+programming http://adam.chlipala.net/cpdt/html/Intro.html
Nov 04 2013
parent "qznc" <qznc web.de> writes:
On Monday, 4 November 2013 at 20:26:18 UTC, Timon Gehr wrote:
 On 11/04/2013 06:44 PM, qznc wrote:
 On Saturday, 2 November 2013 at 13:59:53 UTC, Timon Gehr wrote:
 It took decades to write a certified C compiler.
No, it took four years. CompCert was started in 2005.
CompCert is verified, not certified. ...
? http://ncatlab.org/nlab/show/certified+programming http://adam.chlipala.net/cpdt/html/Intro.html
Interesting definition. Seems not totally clear in the literature. For me "certified" is always connected to some standard documents. For example, the "Wind River Diab Compiler" [0] is certified for DO-178B, IEC 60880, EN 50128, yada yada. This certification is usually done by manual inspection and lots of testing. On the other hand, "verified" means (hopefully formal) proof. CompCert itself [1] usually talks about "verification". As far as I know, CompCert is not certified like the Wind River product above. Hence, you are not allowed to use it in certain high-safety applications (automotive,avionics,etc). [0] http://www.windriver.com/products/development_suite/wind_river_compiler/diab-details-4.html [1] http://compcert.inria.fr/
Nov 05 2013
prev sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 02/11/13 10:55, bearophile wrote:
 To make high integrity software you have to start with reliable tools
I know what you're saying, but there is an inherent assumption in the concept of "reliable tools". So far as I can see the important thing is to assume that _nothing_ in the system is reliable, and that anything can fail. If you rely on the language or on the compiler to detect integral overflows, you're not necessarily safer -- your safety rests on the assumption that the compiler will implement these things correctly, and will ALWAYS do so regardless of circumstances. How can you tell if the automated integral overflow checking is working as it should? And even if it is a high-quality implementation, how do you protect yourself against extreme pathological cases which may arise in very rare circumstances? "Necessary but not sufficient" seems a good phrase to use here.
Nov 02 2013
parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 11/2/2013 8:09 AM, Joseph Rushton Wakeling wrote:
 On 02/11/13 10:55, bearophile wrote:
 To make high integrity software you have to start with reliable tools
I know what you're saying, but there is an inherent assumption in the concept of "reliable tools". So far as I can see the important thing is to assume that _nothing_ in the system is reliable, and that anything can fail.
"Reliable" of course simply meaning "less unreliable".
 If you rely on the language or on the compiler to detect integral
 overflows, you're not necessarily safer -- your safety rests on the
 assumption that the compiler will implement these things correctly, and
 will ALWAYS do so regardless of circumstances.
It still helps and is therefore worthwhile. Nobody's claiming that runtime overflow checks were sufficient to ensure reliability, only that *not* having them can be a bad idea.
Nov 02 2013
prev sibling parent reply "eles" <eles eles.com> writes:
On Saturday, 2 November 2013 at 04:03:46 UTC, Walter Bright wrote:
 On 11/1/2013 8:03 AM, bearophile wrote:
 I think you're missing the point. Improving the quality of the 
 software is not the answer to making fail safe systems.
Well, OTOH, worsening the software won't really increase the reliability of the system.
Nov 05 2013
parent reply "growler" <growlercab gmail.com> writes:
On Tuesday, 5 November 2013 at 08:41:17 UTC, eles wrote:
 On Saturday, 2 November 2013 at 04:03:46 UTC, Walter Bright 
 wrote:
 On 11/1/2013 8:03 AM, bearophile wrote:
 I think you're missing the point. Improving the quality of the 
 software is not the answer to making fail safe systems.
Well, OTOH, worsening the software won't really increase the reliability of the system.
Fail safe design needs to be engineered to handle the situation when any component fails regardless of the quality of components used. Software is just one more (weak) component in the system. Of course component quality is important to overall safety because fail safe systems are not foolproof. But as Walter says it should not be part of the solution nor relied upon in a fail safe deign.
Nov 05 2013
parent "eles" <eles eles.com> writes:
On Wednesday, 6 November 2013 at 01:52:30 UTC, growler wrote:
 On Tuesday, 5 November 2013 at 08:41:17 UTC, eles wrote:
 On Saturday, 2 November 2013 at 04:03:46 UTC, Walter Bright 
 wrote:
 On 11/1/2013 8:03 AM, bearophile wrote:
Fail safe design needs to be engineered to handle the situation when any component fails regardless of the quality of components used. Software is just one more (weak) component in the system.
Yes, but you cannot go at zero probability, only if you use an infinite number of back-ups. Otherwise, there is some infinitesimal, but non-zero probability that everything fails. You take two teams that develop software independently, in different languages, on different machine architectures etc. However, there is a non-zero probability that both teams (or compilers or processor or all of that) expose the same bug or the arbiter that counts the votes has some error. In designing failsafe systems *you rely* on something, because you have no choice. But yes, you go as pessimistic as possible (usually, limited by the budget). Hardware can fail mostly for the same reasons that software fails too. The difference, in the long term, is that once a software is 100% correct, it will never get worse. The hardware can be in good shape today and badly broken tomorrow. Just have a look at Curiosity's digger.
 Of course component quality is important to overall safety 
 because fail safe systems are not foolproof. But as Walter says 
 it should not be part of the solution nor relied upon in a fail 
 safe deign.
As said earlier, you cannot go as extreme as that. You don't rely on any specific part, but you rely on combination of parts and you simply bet on the fact that their probability of independent but simultaneous failure is very small. Then, it is a matter of scale what means "a part" and "several parts". Just zoom in and out on the project's design and you see it. Is more like a fractal. If you don't allow yourself to rely on anything, you get nothing built. You may design perfect fail safe systems, you just cannot build those. The bottom line is: never claim that your system is fully fail safe, no matter the strategy and the care you put in designing and building it. There is no spoon.
Nov 06 2013
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/1/13 8:03 AM, bearophile wrote:
 Walter Bright:

 ...
Everyone who writes safety critical software should really avoid languages unable to detect integral overflows (at compile-time or run-time) in all normal numerical operations,
I'm unclear on why you seem so eager to grind that axe. The matter seems to be rather trivial - disallow statically the use of built-in integrals, and prescribe the use of library types that do the verification. A small part of the codebase that's manually verified (such as the library itself) could use the primitive types. Best of all worlds. In even a medium project, the cost of the verifier and maintaining that library is negligible.
 and languages that have undefined operations in their basic
 semantics.
We need to get SafeD up to snuff!
 So Ada language is OK, C and D are not OK for safety critical software.
Well that's Ada's claim to fame. But I should hope D would have a safety edge over C. Andrei
Nov 02 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 I'm unclear on why you seem so eager to grind that axe.
Because I've tried the alternative, I've seen it catch bugs (unwanted integral overflows) in my code that I was supposing to be "good", so I will never trust again languages that ignores the overflows. And if we talk about high integrity software, D integrals management is not good enough.
 The matter seems to be rather trivial - disallow statically the 
 use of built-in integrals, and prescribe the use of library 
 types that do the verification. A small part of the codebase 
 that's manually verified (such as the library itself) could use 
 the primitive types. Best of all worlds. In even a medium 
 project, the cost of the verifier and maintaining that library 
 is negligible.
How many C++ programs do this? Probably very few (despite now Clang is able to catch something in C code). How many Ada programs perform those run-time tests? Most of them.
 A small part of the codebase that's manually verified (such as 
 the library itself) could use the primitive types.
In some cases you want to use the run-time tests even in verified code, to guard against hardware errors caused by radiations, noise, etc.
 We need to get SafeD up to snuff!
At the moment safeD means "memory safe D", it's not "safe" regarding other kinds of problems. "Undefined operations" are lines of code like this, some of them are supposed to become defined in future D: a[i++] = i++; foo(bar(), baz()); auto x = int.max + 1; and so on.
 But I should hope D would have a safety edge over C.
Of course :-) Idiomatic D code is much simpler to write correctly compared to C code (but in D code you sometimes write more complex code, so you have a little of bug-proneness equalization. This is a part of the human nature). There are many levels of safety, high integrity software is at the top of those levels (and probably even high integrity software has various levels: some submersibles guide software could be not as bug free as software of the Space Shuttle guiding system) is a very small percentage of all the code written today. Bye, bearophile
Nov 02 2013
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/2/13 9:49 AM, bearophile wrote:
 Andrei Alexandrescu:

 I'm unclear on why you seem so eager to grind that axe.
Because I've tried the alternative, I've seen it catch bugs (unwanted integral overflows) in my code that I was supposing to be "good", so I will never trust again languages that ignores the overflows. And if we talk about high integrity software, D integrals management is not good enough.
I disagree.
 The matter seems to be rather trivial - disallow statically the use of
 built-in integrals, and prescribe the use of library types that do the
 verification. A small part of the codebase that's manually verified
 (such as the library itself) could use the primitive types. Best of
 all worlds. In even a medium project, the cost of the verifier and
 maintaining that library is negligible.
How many C++ programs do this? Probably very few (despite now Clang is able to catch something in C code). How many Ada programs perform those run-time tests? Most of them.
That has to do with the defaults chosen. I don't think "mission critical" is an appropriate default for D. I do think that D offers better speed than Ada by default, and better abstraction capabilities than both C++ and Ada, which afford it good library checked integrals. I don't see a necessity to move out of that optimum.
 A small part of the codebase that's manually verified (such as the
 library itself) could use the primitive types.
In some cases you want to use the run-time tests even in verified code, to guard against hardware errors caused by radiations, noise, etc.
How is this a response to what I wrote?
 We need to get SafeD up to snuff!
At the moment safeD means "memory safe D", it's not "safe" regarding other kinds of problems. "Undefined operations" are lines of code like this, some of them are supposed to become defined in future D: a[i++] = i++; foo(bar(), baz()); auto x = int.max + 1; and so on.
Agreed, this is not part of the charter of SafeD. We need to ensure left-to-right evaluation semantics through and through.
 But I should hope D would have a safety edge over C.
Of course :-) Idiomatic D code is much simpler to write correctly compared to C code (but in D code you sometimes write more complex code, so you have a little of bug-proneness equalization. This is a part of the human nature). There are many levels of safety, high integrity software is at the top of those levels (and probably even high integrity software has various levels: some submersibles guide software could be not as bug free as software of the Space Shuttle guiding system) is a very small percentage of all the code written today.
And to take that to its logical conclusion, we don't want the defaults in the D language to cater to such applications. Andrei
Nov 02 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 How is this a response to what I wrote?
I have said that sometimes you want to use the safer types even in a place where you have said could use the primitive types.
 And to take that to its logical conclusion, we don't want the 
 defaults in the D language to cater to such applications.
Integral safety (and few other kinds of safeties) are useful even in software that's not high integrity :-) Bye, bearophile
Nov 02 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/2/13 10:29 AM, bearophile wrote:
 Andrei Alexandrescu:

 How is this a response to what I wrote?
I have said that sometimes you want to use the safer types even in a place where you have said could use the primitive types.
Then just use the safer types! I mentioned the possibility as, well, a possibility. But then how will you use the safer types to implement the safer types?
 And to take that to its logical conclusion, we don't want the defaults
 in the D language to cater to such applications.
Integral safety (and few other kinds of safeties) are useful even in software that's not high integrity :-)
So is speed, and you're among the first to get miffed when we have a performance regression. And for a good reason. You're also one to clamor for backward compatibility, and it would be hard to imagine a more massive one. So just drop this. Andrei
Nov 02 2013
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 We need to ensure left-to-right evaluation semantics through 
 and through.
Good. And Walter agrees with you. I am not asking for a road map with dates, but what are the steps and the work to arrive there? What needs to be done? Bye, bearophile
Nov 02 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/2/13 12:42 PM, bearophile wrote:
 Andrei Alexandrescu:

 We need to ensure left-to-right evaluation semantics through and through.
Good. And Walter agrees with you.
Actually he only agrees partially. In the expression: a[i++] = i++; Walter argues "evaluate right-hand side first, then left-hand side, then assign". I argue "evaluate left-to-right regardless". I suspect this extends to other cases, such as: functions[i++](i++); (assuming functions is an array of functions). I argue that LTR is the one way to go. Not sure he'd agree. Similarly: objects[i++].method(i++); (assuming objects is an array of objects). Here again I think LTR is the simplest and most intuitive rule.
 I am not asking for a road map with
 dates, but what are the steps and the work to arrive there? What needs
 to be done?
I think someone should sit down and define behavior for all such cases, then proceed and implement it in the front-end. I haven't done significant front end work so I don't know what that entails. Andrei
Nov 02 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 Actually he only agrees partially.
I didn't know this.
 In the expression:

 a[i++] = i++;

 Walter argues "evaluate right-hand side first, then left-hand 
 side, then assign". I argue "evaluate left-to-right 
 regardless". I suspect this extends to other cases, such as:

 functions[i++](i++);

 (assuming functions is an array of functions). I argue that LTR 
 is the one way to go. Not sure he'd agree. Similarly:

 objects[i++].method(i++);

 (assuming objects is an array of objects). Here again I think 
 LTR is the simplest and most intuitive rule.
Where it's too much hard for us to tell what's the most intuitive behavour, it means the code is very anti-intuitive. Such code is going to make me scratch my head regardless what rule the D compiler will follow. So it's code that I refactor mercilessly to make it clear enough, splitting it in as many lines as needed. All this means that too much complex cases can be disallowed statically by the D compiler. This could break a little of code, but it's D code that today relies on undefined behavour, so turning it into a syntax error it's actually an improvement. So what I am saying is to define semantics for the normal cases, and just statically disallow the hardest cases.
 I think someone should sit down and define behavior for all 
 such cases, then proceed and implement it in the front-end. I 
 haven't done significant front end work so I don't know what 
 that entails.
Timon Gehr has a sharp mind that seems fit to spot most of those cases, with some help from others, like Kenji. So that can be a first step. Perhaps a little DIP could be handy to keep such ideas. Bye, bearophile
Nov 02 2013
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/2/13 1:25 PM, bearophile wrote:
 Where it's too much hard for us to tell what's the most intuitive
 behavour, it means the code is very anti-intuitive. Such code is going
 to make me scratch my head regardless what rule the D compiler will
 follow. So it's code that I refactor mercilessly to make it clear
 enough, splitting it in as many lines as needed. All this means that too
 much complex cases can be disallowed statically by the D compiler. This
 could break a little of code, but it's D code that today relies on
 undefined behavour, so turning it into a syntax error it's actually an
 improvement. So what I am saying is to define semantics for the normal
 cases, and just statically disallow the hardest cases.
I think you underestimate the fraction of the cases that would be disabled. Pretty much any use where aliasing is a possibility would not pass muster. Andrei
Nov 02 2013
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/2/13 1:25 PM, bearophile wrote:
 Timon Gehr has a sharp mind that seems fit to spot most of those cases,
 with some help from others, like Kenji. So that can be a first step.
 Perhaps a little DIP could be handy to keep such ideas.
They call it "open source" for a reason. Have at it!!! Andrei
Nov 02 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/2/2013 9:49 AM, bearophile wrote:
 not as bug free as software of the Space Shuttle guiding system
Nobody pretends that the Space Shuttle flight control computers are safe. There are 4 of them, plus a 5th with completely different design & software in it. This is coupled with a voting system.
Nov 02 2013
parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Sunday, 3 November 2013 at 03:03:58 UTC, Walter Bright wrote:
 Nobody pretends that the Space Shuttle flight control computers 
 are safe. There are 4 of them, plus a 5th with completely 
 different design & software in it. This is coupled with a 
 voting system.
I heard they're retooling that system after the last design fell victim to the "hanging chad" problem.
Nov 04 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/4/2013 12:09 PM, Sean Kelly wrote:
 On Sunday, 3 November 2013 at 03:03:58 UTC, Walter Bright wrote:
 Nobody pretends that the Space Shuttle flight control computers are safe.
 There are 4 of them, plus a 5th with completely different design & software in
 it. This is coupled with a voting system.
I heard they're retooling that system after the last design fell victim to the "hanging chad" problem.
?
Nov 04 2013
parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Monday, 4 November 2013 at 21:00:25 UTC, Walter Bright wrote:
 On 11/4/2013 12:09 PM, Sean Kelly wrote:
 I heard they're retooling that system after the last design 
 fell victim to the "hanging chad" problem.
?
Jokes are so much less funny when you have to explain them. Voting. Hanging chads. Sigh.
Nov 04 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/4/2013 3:44 PM, Sean Kelly wrote:
 On Monday, 4 November 2013 at 21:00:25 UTC, Walter Bright wrote:
 On 11/4/2013 12:09 PM, Sean Kelly wrote:
 I heard they're retooling that system after the last design fell victim to
 the "hanging chad" problem.
?
Jokes are so much less funny when you have to explain them. Voting. Hanging chads. Sigh.
Ah :-)
Nov 04 2013
prev sibling parent "eles" <eles eles.com> writes:
On Monday, 4 November 2013 at 20:09:27 UTC, Sean Kelly wrote:
 On Sunday, 3 November 2013 at 03:03:58 UTC, Walter Bright wrote:
 I heard they're retooling that system after the last design 
 fell victim to the "hanging chad" problem.
I loved this one. And I learnt a thing: http://www.answers.com/topic/what-is-a-hanging-chad
Nov 05 2013