www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - More evidence that memory safety is the future for programming

reply Walter Bright <newshound2 digitalmars.com> writes:
https://news.ycombinator.com/item?id=22711391

Fitting in with the push for  safe as the default, and the  live 
Ownership/Borrowing system for D.

We can either get on the bus or get run over by the bus.
Mar 28
next sibling parent reply JN <666total wp.pl> writes:
On Saturday, 28 March 2020 at 20:24:02 UTC, Walter Bright wrote:
 https://news.ycombinator.com/item?id=22711391

 Fitting in with the push for  safe as the default, and the 
  live Ownership/Borrowing system for D.

 We can either get on the bus or get run over by the bus.
Over time we learned that opt-in safety doesn't work and safe should be the default. Do you think the same will happen with live?
Mar 28
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 28.03.20 22:03, JN wrote:
 On Saturday, 28 March 2020 at 20:24:02 UTC, Walter Bright wrote:
 https://news.ycombinator.com/item?id=22711391

 Fitting in with the push for  safe as the default, and the  live 
 Ownership/Borrowing system for D.

 We can either get on the bus or get run over by the bus.
Over time we learned that opt-in safety doesn't work and safe should be the default. Do you think the same will happen with live?
Why would this happen? The only thing that safe and live have in common is that they are function qualifiers that add additional type checking. safe is in service of an invariant (memory safety), while live is not. It's just a linting tool that (from the perspective of memory safety) produces exclusively false positives in safe code. Also, there is no way to negate it.
Mar 28
prev sibling next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 29/03/2020 9:24 AM, Walter Bright wrote:
 https://news.ycombinator.com/item?id=22711391
 
 Fitting in with the push for  safe as the default, and the  live 
 Ownership/Borrowing system for D.
 
 We can either get on the bus or get run over by the bus.
"This is why D requires and live attribute be added to functions to enable the checking for just those functions, so it doesn't break every program out there." Interesting quote there. But what if we want to use these semantics isolated and not turn it on in every function in use? The alternative I have been proposing is headconst tied to lifetimes, basically a borrowed pointer just with a bit of extra syntax that does propagate within a type. https://gist.github.com/rikkimax/4cb2cc8ddcac33c1a9bb20de432f9dea
Mar 28
parent reply Arine <arine123445128843 gmail.com> writes:
On Saturday, 28 March 2020 at 23:14:05 UTC, rikki cattermole 
wrote:
 On 29/03/2020 9:24 AM, Walter Bright wrote:
 https://news.ycombinator.com/item?id=22711391
 
 Fitting in with the push for  safe as the default, and the 
  live Ownership/Borrowing system for D.
 
 We can either get on the bus or get run over by the bus.
"This is why D requires and live attribute be added to functions to enable the checking for just those functions, so it doesn't break every program out there." Interesting quote there.
Indeed it is interesting, on one hand, you have safe being made the default, breaking all code in existence. And then you have live being introduced so it doesn't break all code in existence. It seems live is making the same mistake as safe. How long until live becomes the default? I also find this claim to be quite bold:
 This is why D uses DFA to catch 100% of the positives with 0% 
 negatives.
When in the review thread previously the sentiment was along the lines of, "patch the holes as they appear". So how do you go from patching holes as they appear, to 100% guaranteeing it catches everything correctly, without thorough testing. Or is that just empty marketing promises?
Mar 28
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 29.03.20 03:08, Arine wrote:
 On Saturday, 28 March 2020 at 23:14:05 UTC, rikki cattermole wrote:
 On 29/03/2020 9:24 AM, Walter Bright wrote:
 https://news.ycombinator.com/item?id=22711391

 Fitting in with the push for  safe as the default, and the  live 
 Ownership/Borrowing system for D.

 We can either get on the bus or get run over by the bus.
"This is why D requires and live attribute be added to functions to enable the checking for just those functions, so it doesn't break every program out there." Interesting quote there.
Indeed it is interesting, on one hand, you have safe being made the default, breaking all code in existence. And then you have live being introduced so it doesn't break all code in existence. It seems live is making the same mistake as safe. How long until live becomes the default? ...
live will never be the default in any version of D that I am willing to use. I really don't understand why people are inclined to think it is on the same level as safe. It just is not.
 
 I also find this claim to be quite bold:
 
 This is why D uses DFA to catch 100% of the positives with 0% negatives.
When in the review thread previously the sentiment was along the lines of, "patch the holes as they appear". So how do you go from patching holes as they appear, to 100% guaranteeing it catches everything correctly, without thorough testing. Or is that just empty marketing promises?
It's not a statement that has a standard meaning. What positives and what negatives? Refer to the following table: | problem exists | problem does not exist ---------------------------------------------------------------- flagged by linter | true positive | false positive ---------------------------------------------------------------- not flagged by linter | false negative | true negative ---------------------------------------------------------------- live leads to any of the four kinds of outcomes: void foo(int *p) live{ free(p); free(p); // true positive } void bar(int *p) live{ auto q=p; *p=3; // false positive } void baz() live{ auto p=new int; free(p); // false negative } void qux() life{ auto p=cast(int*)malloc(int.sizeof); free(p); // true negative } An amusing way to interpret 100% positives and 0% negatives would be to say the linter flags all positives (either true or false) and no negatives (neither true nor false). That basically means the linter flags every code as being problematic, which would make it completely useless. In the thread on hackernews, there were a few people who interpreted the statement as saying that the linter has neither false positives nor false negatives, which is provably impossible. You can get rid of one of them at a time, but not both. safe is an example of a feature that does not have false negatives (unless the implementation is buggy), but of course it has loads of false positives, for example you cannot manage memory manually in safe code without trusted escapes, even if you do it correctly.
Mar 28
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 29.03.20 04:28, Timon Gehr wrote:
 
 void qux() life{
Actually, there is a true positive on this line.
Mar 28
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 29.03.20 04:28, Timon Gehr wrote:
 
 An amusing way to interpret 100% positives and 0% negatives would be to 
 say the linter flags all positives (either true or false) and no 
 negatives (neither true nor false). That basically means the linter 
 flags every code as being problematic, which would make it completely 
 useless.
Another way to interpret it is to just take it as the statement that everything that is flagged by the linter is a positive, and everything else is a negative. This is the definition of positive and negative.
Mar 28
prev sibling next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 28.03.20 21:24, Walter Bright wrote:
  live Ownership/Borrowing system
live is not an Ownership/Borrowing system, even though it is true that it is based on concepts related to ownership and borrowing. An Ownership/Borrowing system enforces ownership semantics in safe code, live does not. It is a linter for system and trusted code with no safety guarantees.
Mar 28
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Saturday, 28 March 2020 at 20:24:02 UTC, Walter Bright wrote:
 Fitting in with the push for  safe as the default, and the 
  live Ownership/Borrowing system for D.

 We can either get on the bus or get run over by the bus.
I have recently added a lot of unittests to my code. That confirmed me that as we all know, it is a mandatory to do if I even a remotely bug-free program is desired :). And that's in a program that already had no global state and used ranges, `final switch`es and `assert`s fairly much. So I am thinking, perhaps a dead-easy and well known way to test correct `malloc`-`free` pairing in unit tests would work just as well, but be easier to implement and use? It'd be something like this to use: ``` unittest { auto tracer = MallocTracer(); if (true) { auto raiiObject = allocAnObject(); raiiObject.doSomething(); assert(tracer.numMallocs == tracer.numFrees - 1); } assert(tracer.numMallocs == tracer.numFrees); } ```
Mar 30
parent reply Atila Neves <atila.neves gmail.com> writes:
On Monday, 30 March 2020 at 12:58:33 UTC, Dukc wrote:
 On Saturday, 28 March 2020 at 20:24:02 UTC, Walter Bright wrote:
 [...]
I have recently added a lot of unittests to my code. That confirmed me that as we all know, it is a mandatory to do if I even a remotely bug-free program is desired :). And that's in a program that already had no global state and used ranges, `final switch`es and `assert`s fairly much. [...]
It's easier to use asan with ldc. I did write an allocator to do this before asan was available though: https://github.com/atilaneves/test_allocator
Mar 30
parent reply Dukc <ajieskola gmail.com> writes:
On Monday, 30 March 2020 at 13:20:08 UTC, Atila Neves wrote:
 It's easier to use asan with ldc. I did write an allocator to 
 do this before asan was available though: 
 https://github.com/atilaneves/test_allocator
Yeah, something like those is what I meant. Thanks - I have to remember those when next having problems with `malloc`s. Lowering the bar to use a tool like these is IMO more effective than pushing Rust-like static analysis. Basically, to cut down memory problems a sanitizer should be as easy to use as the built-in `unittest`s. Sure static checks can be useful too, but to be worth it they need to be easier to use than the sanitizer, and in any case static checks can't completely replace sanitizers.
Mar 31
parent reply Atila Neves <atila.neves gmail.com> writes:
On Tuesday, 31 March 2020 at 23:08:01 UTC, Dukc wrote:

 Basically, to cut down memory problems a sanitizer should be as 
 easy to use as the built-in `unittest`s.
It is: dflags: "-fsanitize=address" platform="ldc" If you're not using dub, then: ldc2 -fsanitize=address --unittest $REST_OF_ARGS
Apr 01
parent Dukc <ajieskola gmail.com> writes:
On Wednesday, 1 April 2020 at 14:30:20 UTC, Atila Neves wrote:
 On Tuesday, 31 March 2020 at 23:08:01 UTC, Dukc wrote:
 Basically, to cut down memory problems a sanitizer should be 
 as easy to use as the built-in `unittest`s.
It is: dflags: "-fsanitize=address" platform="ldc" If you're not using dub, then: ldc2 -fsanitize=address --unittest $REST_OF_ARGS
Great!
Apr 01
prev sibling parent reply Johan <j j.nl> writes:
On Saturday, 28 March 2020 at 20:24:02 UTC, Walter Bright wrote:
 https://news.ycombinator.com/item?id=22711391

 Fitting in with the push for  safe as the default, and the 
  live Ownership/Borrowing system for D.

 We can either get on the bus or get run over by the bus.
Why is this news? Clang has had this for a decade, and it certainly wasn't the first. If there is a bus, it's left the station a very long time ago. Isn't it more interesting to find a comprehensive resource management solution, instead of working on a solution only for the special case of memory [*]. A double file close is also bad, for example. Maybe RAII and move semantics isn't it, but at least it doesn't single out one type of resource. -Johan [*] Let alone the even more special case of functions with the exact symbol names "malloc" and "free" in this case...
Mar 30
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/30/2020 12:25 PM, Johan wrote:
 Clang has had this for a decade
Do you mean RAII? RAII is only a partial solution. For example, it is quite easy for an RAII object to leak a reference to its internals, and then the RAII object gets deleted, but the reference is still there.
Mar 30
parent reply Johan <j j.nl> writes:
On Tuesday, 31 March 2020 at 01:26:50 UTC, Walter Bright wrote:
 On 3/30/2020 12:25 PM, Johan wrote:
 Clang has had this for a decade
Do you mean RAII? RAII is only a partial solution. For example, it is quite easy for an RAII object to leak a reference to its internals, and then the RAII object gets deleted, but the reference is still there.
I meant static analysis.
Apr 01
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/1/2020 12:03 PM, Johan wrote:
 I meant static analysis.
I'm not familiar specifically with clang's static analysis, but my experience with such is they detect a few obvious patterns, and miss the subtle ones that cause all the trouble. For example, int* foo(int i) { return &i; } gets detected by static analysis. This one does not: int* bar(int* p) { return p; } int* foo(int i) { return bar(&i); } The idea with D is to solve it in the general case, not by matching specific patterns. #Dip1000 is how D solves the second case in general.
Apr 01
parent reply Jacob Carlborg <doob me.com> writes:
On Wednesday, 1 April 2020 at 21:44:27 UTC, Walter Bright wrote:

     int* foo(int i) { return &i; }
This is detected by Clang without running the static analyzer: $ clang main.c main.c:1:27: warning: address of stack memory associated with parameter 'i' returned [-Wreturn-stack-address] int* foo(int i) { return &i; } ^ 1 warning generated.
 gets detected by static analysis. This one does not:

     int* bar(int* p) { return p; }

     int* foo(int i) { return bar(&i); }
The Clang static analyzer detects this: $ clang --analyze main.c main.c:2:19: warning: Address of stack memory associated with local variable 'i' returned to caller int* foo(int i) { return bar(&i); } ~~~~~ ^~~~~~~~~~~~~~ 1 warning generated. -- /Jacob Carlborg
Apr 02
next sibling parent Johan <j j.nl> writes:
On Thursday, 2 April 2020 at 10:04:24 UTC, Jacob Carlborg wrote:
 On Wednesday, 1 April 2020 at 21:44:27 UTC, Walter Bright wrote:

 gets detected by static analysis. This one does not:

     int* bar(int* p) { return p; }

     int* foo(int i) { return bar(&i); }
The Clang static analyzer detects this: $ clang --analyze main.c main.c:2:19: warning: Address of stack memory associated with local variable 'i' returned to caller int* foo(int i) { return bar(&i); } ~~~~~ ^~~~~~~~~~~~~~ 1 warning generated.
Come on, of course such a super simple example is detected by clang's static analyzer. From my Inkscape days, I remember a case where the static analyzer found a bug with 80+ steps across multiple cpp files and complex control flow. I imagine Clang's analyzer is doing similar proofs that the D compiler is doing. My guess is that the problem is similar to the halting problem. And memory issues like this are just not provable (I think): https://github.com/dlang/dmd/pull/7050 Running the program with a sanitizer (memory, threads, UB, ...) appears easier to me in the many cases where this is possible. Nice to see you are advocating it Atila! :) My only point was to question the newness of the news in OP and the statement that we need to "get on the bus". C++ has had powerful static analysis engines for a long time, and I don't think it changed the landscape. C++ has very nice runtime sanitizers, but again I don't think it is changing the landscape as much as one may have expected. -Johan
Apr 02
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/2/2020 3:04 AM, Jacob Carlborg wrote:
 $ clang --analyze main.c
 main.c:2:19: warning: Address of stack memory associated with local variable
'i' 
 returned to caller
 int* foo(int i) { return bar(&i); }
           ~~~~~    ^~~~~~~~~~~~~~
 1 warning generated.
Now try: int* bar(int* p); int* foo(int i) { return bar(&i); } And then: struct S { int* p; }; struct S foo(struct S* ps, int i) { ps->p = &i; return *ps; } It falls apart. Now let's try D: struct S { int* p; } safe S foo(S* ps, int i) { ps.p = &i; // Error: cannot take address of parameter i in safe function foo return *ps; } The point is to get them all, not a few simple patterns.
Apr 02
parent Walter Bright <newshound2 digitalmars.com> writes:
Some experimenting with clang shows it loses track of things when one level of 
indirection is added:

   struct S* malloc();
   void free(struct S*);

   void nut(struct S* s, int* pi) { free(s); *pi = 4; }

   void bolt()
   {
     struct S* s = malloc();
     struct S** ps = &s;     // <= add indirection
     nut(*ps, (*ps)->i);
   }

or when extern functions are used (i.e. function bodies are not available).

Other things clang doesn't detect:

   int* malloc();
   void free(int*);

   int nut();

   void bolt(int i)
   {
     int* p = malloc();
     *p = 1;
   }

Doesn't find the memory leak. Also, if you write your own storage allocator, 
clang doesn't pick it up.

clang actually does a nice job with what it has to work with - it's the C and 
C++ languages that are not amenable to doing it 100%.
Apr 02
prev sibling parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Monday, 30 March 2020 at 19:25:53 UTC, Johan wrote:
 On Saturday, 28 March 2020 at 20:24:02 UTC, Walter Bright wrote:
 https://news.ycombinator.com/item?id=22711391

 Fitting in with the push for  safe as the default, and the 
  live Ownership/Borrowing system for D.

 We can either get on the bus or get run over by the bus.
Isn't it more interesting to find a comprehensive resource management solution, instead of working on a solution only for the special case of memory [*]. A double file close is also bad, for example. Maybe RAII and move semantics isn't it, but at least it doesn't single out one type of resource.
I have to agree with you, most of the time I don't care about memory, but rather whatever I am modelling within that memory. Yes, that 24 might look right a regular int, but to me it holds significance beyond the fact that it is a mere int. For my web library spasm I am dealing with JS objects that I have to release at the right time. Things get hairy with delegates, callback and long lived references. At first I tried reference counting, but found out there was significant bloat (I like to keep my web binaries small), eventually I settled for non-copyable objects so I get unique references, and release them on the JS side when the last and only reference goes out of scope. Of course now I run into other issues. 'scope ref' helps a bit, but I find that I have to write `move` a lot. The parts where I am struggling a bit are where I get a handle to an JS object that is conceptually a sumtype, an optional or a base object and need to unwrap or up cast things. I have solved them, but with some rather gnarly system code. Remember, I want unique references (and borrow) to plain ints here. I know it might look to the compiler as a regular int it can simply copy (and it is), but to me it looks like a JS mouseevent object that needs clean up after the last reference is gone.
Mar 31
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 31 March 2020 at 07:26:38 UTC, Sebastiaan Koppe wrote:
 At first I tried reference counting, but found out there was 
 significant bloat (I like to keep my web binaries small), 
 eventually I settled for non-copyable objects so I get unique 
 references, and release them on the JS side when the last and 
 only reference goes out of scope.
How do you do this? Do you do ref counting on the JS side? I see that there is a proposal for weak references for javascript: https://v8.dev/features/weak-references I guess that could be useful.
Mar 31
parent Sebastiaan Koppe <mail skoppe.eu> writes:
On Tuesday, 31 March 2020 at 11:38:40 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 31 March 2020 at 07:26:38 UTC, Sebastiaan Koppe 
 wrote:
 At first I tried reference counting, but found out there was 
 significant bloat (I like to keep my web binaries small), 
 eventually I settled for non-copyable objects so I get unique 
 references, and release them on the JS side when the last and 
 only reference goes out of scope.
How do you do this? Do you do ref counting on the JS side?
No, I keep all the JS objects in a JS array so the GC won't free them. When the time comes I call a release function from D, which removes the object from the array. In the off case the D code takes the hold of the same JS object twice (e.g. twice the same querySelector or similar), there would be 2 entries in the JS array for the same object, each having their own unique reference. JS engines do objects in arrays pretty well. It works nicely combined with the unique reference semantics I have on the D side. And if you really need to you can wrap it in a refcount and get the best of both worlds.
 I see that there is a proposal for weak references for 
 javascript:
 https://v8.dev/features/weak-references

 I guess that could be useful.
That is pure JS though. There is a webassembly proposal to introduce anyref, whereby you can move js objects into wasm, and the js engine will track them. It might take a while before that is available.
Mar 31