digitalmars.dip.ideas - Type state analysis
- Richard (Rikki) Andrew Cattermole (40/40) Mar 16 This proposal introduces a new semantic pass, that uses an IR to
- claptrap (3/6) Mar 17 Any examples of languages that use this and what it provides?
- Richard (Rikki) Andrew Cattermole (12/19) Mar 17 into mainstream programming languages. However, many academic projects
- Paul Backus (14/18) Mar 17 Yet another proposal for a humongous language feature to solve a
- Atila Neves (2/17) Apr 08 I second all of the above.
- Richard (Rikki) Andrew Cattermole (15/36) Apr 08 Just so we are clear, DIP1000 is not able to provide memory safety on a
- Atila Neves (17/49) Apr 10 DIP1000 depends on lifetimes and RAII, and in a language like D
- Richard (Rikki) Andrew Cattermole (7/60) Apr 10 Yes, because of the explicit scope creation. The compiler can see it
- Atila Neves (13/75) Apr 10 Unless I'm missing something, that's exactly what DIP1000 does?
- Richard (Rikki) Andrew Cattermole (37/65) Apr 10 See my earlier example:
- Dukc (16/28) Apr 13 Assuming you meant `destroy(*parent);` instead of destroying only
- Walter Bright (24/24) Mar 29 The @live functions do type state analysis - data flow analysis is used ...
- Richard (Rikki) Andrew Cattermole (58/92) Mar 29 Yes, typical applications of DFA need to be run on the whole program to
- Walter Bright (8/8) Mar 29 In order to make this work, it will need typestate annotations on all:
- Richard (Rikki) Andrew Cattermole (17/18) Mar 29 If people had to write annotations yes.
This proposal introduces a new semantic pass, that uses an IR to analyse memory transfers and the different states it can be in. It is the bleading edge on program security, that would enable other memory analysis solutions such as isolated (immutable references ala Midori). It enables: - To prevent reads to uninitialized variables - Prevent reads, writes, method calls to null objects - Logic errors such as trying to read from an unopened file - Is a framework to analyse unreachable vs reachable variables It does not affect the type system. Additional information must be provided by the user if they want to go beyond the defaults. The default type state of each variable is meant to be as close to provable by the compiler, with user assistance like null checks. A key premise is that newer edition `` safe`` function will not be able to call the older `` safe`` function. This drastically simplifies decisions. The end goal of this is to enable D to become temporally safe, which is critical to D's future (see the recent US government report on memory safety with specific mention towards temporal safety). Without this, the framework required to do temporal analysis does not exist, and what would otherwise be very clear decisions become arguable at best, at worst complete unknowns. For each variable declaration I am proposing a default type state, along with manual specification of the input and output states it may be in by using the UAX31 Medial character ``identifier'input'output``. This proposal was born out of necessaity after failure to solve issues surrounding isolated, and it too required a similar IR to be built. I recognize that this could be a bit costly, but due to the potential performance wins as well as memory and logic safety, this is an absolute must if we want to be competitive. Latest: https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40 Current: https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40/a8fffb5725904c6f5d74052d9c974a8f5d453fb0
Mar 16
On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew Cattermole wrote:This proposal introduces a new semantic pass, that uses an IR to analyse memory transfers and the different states it can be in.Any examples of languages that use this and what it provides?
Mar 17
On 18/03/2024 12:25 AM, claptrap wrote:On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew Cattermole wrote:From Wikipedia:This proposal introduces a new semantic pass, that uses an IR to analyse memory transfers and the different states it can be in.Any examples of languages that use this and what it provides?Typestate is an experimental concept that has not yet crossed overinto mainstream programming languages. However, many academic projects actively investigate how to make it more useful as an everyday programming technique. Two examples are the Plaid and Obsidian languages, which are being developed by Jonathan Aldrich's group at Carnegie Mellon University.[16][17] Other examples include the Clara[18] language research framework, earlier versions of the Rust language, and the >> keyword in ATS.[19] This is the bleeding edge of research. https://en.wikipedia.org/wiki/Typestate_analysis If we get this, we'll be writing the literature.
Mar 17
On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew Cattermole wrote:Latest: https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40 Current: https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40/a8fffb5725904c6f5d74052d9c974a8f5d453fb0Yet another proposal for a humongous language feature to solve a problem that can already be solved without it. It is possible to achieve temporal safety in D already with `scope` (DIP 1000) and ` system` variables (DIP 1035). The ergonomics are not great, but they can be improved (e.g., with built-in move-on-last-use). This proposal specifically has the same problem as ` live`: it splits ` safe` code into separate "new ` safe`" and "old ` safe`" dialects, which are mutually incompatible. The ideas themselves are not terrible. I would be interested to see what this looks like implemented in a research language. But I do not think it has any place in D.
Mar 17
On Sunday, 17 March 2024 at 15:35:40 UTC, Paul Backus wrote:On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew Cattermole wrote:I second all of the above.[...]Yet another proposal for a humongous language feature to solve a problem that can already be solved without it. It is possible to achieve temporal safety in D already with `scope` (DIP 1000) and ` system` variables (DIP 1035). The ergonomics are not great, but they can be improved (e.g., with built-in move-on-last-use). This proposal specifically has the same problem as ` live`: it splits ` safe` code into separate "new ` safe`" and "old ` safe`" dialects, which are mutually incompatible. The ideas themselves are not terrible. I would be interested to see what this looks like implemented in a research language. But I do not think it has any place in D.
Apr 08
On 09/04/2024 6:18 AM, Atila Neves wrote:On Sunday, 17 March 2024 at 15:35:40 UTC, Paul Backus wrote:Just so we are clear, DIP1000 is not able to provide memory safety on a single thread, let alone multiple. It is missing the child/parent relationship to prevent: ```d Parent* parent; Field* child = &parent.field; parent.destroy; Field value = *child; ``` This happens to be one of the things that the DFA is capable of tracking, that is not currently possible to implement (Dennis has tried). It also does not provide any way of tracking memory from creation to ensure it can be transferred safely to another thread in its entirety to prevent needing things like locks.On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew Cattermole wrote:I second all of the above.[...]Yet another proposal for a humongous language feature to solve a problem that can already be solved without it. It is possible to achieve temporal safety in D already with `scope` (DIP 1000) and ` system` variables (DIP 1035). The ergonomics are not great, but they can be improved (e.g., with built-in move-on-last-use). This proposal specifically has the same problem as ` live`: it splits ` safe` code into separate "new ` safe`" and "old ` safe`" dialects, which are mutually incompatible. The ideas themselves are not terrible. I would be interested to see what this looks like implemented in a research language. But I do not think it has any place in D.
Apr 08
On Monday, 8 April 2024 at 20:10:24 UTC, Richard (Rikki) Andrew Cattermole wrote:On 09/04/2024 6:18 AM, Atila Neves wrote:DIP1000 depends on lifetimes and RAII, and in a language like D with constructors and destructors I struggle to find a reason to write `parent.destroy`. This works as intended, in the sense that it fails to compile: ``` Field* child; { Parent parent; child = &parent.field; } Field value = *child; ``` It's the same reason why I don't understand "Logic errors such as trying to read from an unopened file" - why would a file be unopened?On Sunday, 17 March 2024 at 15:35:40 UTC, Paul Backus wrote:Just so we are clear, DIP1000 is not able to provide memory safety on a single thread, let alone multiple. It is missing the child/parent relationship to prevent: ```d Parent* parent; Field* child = &parent.field; parent.destroy; Field value = *child; ```On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew Cattermole wrote:I second all of the above.[...]Yet another proposal for a humongous language feature to solve a problem that can already be solved without it. It is possible to achieve temporal safety in D already with `scope` (DIP 1000) and ` system` variables (DIP 1035). The ergonomics are not great, but they can be improved (e.g., with built-in move-on-last-use). This proposal specifically has the same problem as ` live`: it splits ` safe` code into separate "new ` safe`" and "old ` safe`" dialects, which are mutually incompatible. The ideas themselves are not terrible. I would be interested to see what this looks like implemented in a research language. But I do not think it has any place in D.
Apr 10
On 11/04/2024 7:17 AM, Atila Neves wrote:On Monday, 8 April 2024 at 20:10:24 UTC, Richard (Rikki) Andrew Cattermole wrote:Yes, because of the explicit scope creation. The compiler can see it cross scopes, but not within a scope this requires tracking of values and their order of invalidation.On 09/04/2024 6:18 AM, Atila Neves wrote:DIP1000 depends on lifetimes and RAII, and in a language like D with constructors and destructors I struggle to find a reason to write `parent.destroy`. This works as intended, in the sense that it fails to compile: ``` Field* child; { Parent parent; child = &parent.field; } Field value = *child; ```On Sunday, 17 March 2024 at 15:35:40 UTC, Paul Backus wrote:Just so we are clear, DIP1000 is not able to provide memory safety on a single thread, let alone multiple. It is missing the child/parent relationship to prevent: ```d Parent* parent; Field* child = &parent.field; parent.destroy; Field value = *child; ```On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew Cattermole wrote:I second all of the above.[...]Yet another proposal for a humongous language feature to solve a problem that can already be solved without it. It is possible to achieve temporal safety in D already with `scope` (DIP 1000) and ` system` variables (DIP 1035). The ergonomics are not great, but they can be improved (e.g., with built-in move-on-last-use). This proposal specifically has the same problem as ` live`: it splits ` safe` code into separate "new ` safe`" and "old ` safe`" dialects, which are mutually incompatible. The ideas themselves are not terrible. I would be interested to see what this looks like implemented in a research language. But I do not think it has any place in D.It's the same reason why I don't understand "Logic errors such as trying to read from an unopened file" - why would a file be unopened?Accidental, different scope states (conditional/function calls), explicit closing, events handled automatically by OS. A lot of reasons.
Apr 10
On Wednesday, 10 April 2024 at 19:23:17 UTC, Richard (Rikki) Andrew Cattermole wrote:On 11/04/2024 7:17 AM, Atila Neves wrote:Unless I'm missing something, that's exactly what DIP1000 does? How else would "order of invalidation" work except through scopes? Explicit destruction? Why do that??On Monday, 8 April 2024 at 20:10:24 UTC, Richard (Rikki) Andrew Cattermole wrote:Yes, because of the explicit scope creation. The compiler can see it cross scopes, but not within a scope this requires tracking of values and their order of invalidation.On 09/04/2024 6:18 AM, Atila Neves wrote:DIP1000 depends on lifetimes and RAII, and in a language like D with constructors and destructors I struggle to find a reason to write `parent.destroy`. This works as intended, in the sense that it fails to compile: ``` Field* child; { Parent parent; child = &parent.field; } Field value = *child; ```On Sunday, 17 March 2024 at 15:35:40 UTC, Paul Backus wrote:Just so we are clear, DIP1000 is not able to provide memory safety on a single thread, let alone multiple. It is missing the child/parent relationship to prevent: ```d Parent* parent; Field* child = &parent.field; parent.destroy; Field value = *child; ```On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew Cattermole wrote:I second all of the above.[...]Yet another proposal for a humongous language feature to solve a problem that can already be solved without it. It is possible to achieve temporal safety in D already with `scope` (DIP 1000) and ` system` variables (DIP 1035). The ergonomics are not great, but they can be improved (e.g., with built-in move-on-last-use). This proposal specifically has the same problem as ` live`: it splits ` safe` code into separate "new ` safe`" and "old ` safe`" dialects, which are mutually incompatible. The ideas themselves are not terrible. I would be interested to see what this looks like implemented in a research language. But I do not think it has any place in D.What would an example of that look like?It's the same reason why I don't understand "Logic errors such as trying to read from an unopened file" - why would a file be unopened?Accidental, different scope states (conditional/function calls),explicit closingDoctor, doctor, it hurts when..., events handled automatically by OS.Example?A lot of reasons.I haven't encountered any; I'm not saying they don't exist, I'm saying that in my experience they're rare enough that they're approximately zero. I don't think that's ever happened since I stopped writing C. If one creates a file, it's open. If the file is no longer in scope, it's closed.
Apr 10
On 11/04/2024 7:31 AM, Atila Neves wrote:On Wednesday, 10 April 2024 at 19:23:17 UTC, Richard (Rikki) Andrew Cattermole wrote:See my earlier example: ```d Parent* parent; Field* child = &parent.field; parent.destroy; Field value = *child; ``` That simply shouldn't compile. One single scope, and it absolutely should be under DIP1000's purview. This is critical for owning data structures (which can include memory allocators) for it to be solved. Note: destruction also includes assignments to the variable that had contained the data structure. ```d DS ds = ...; Item* item = ds[...]; ds = ...; Item value = *item; ``` Same exact pattern.Yes, because of the explicit scope creation. The compiler can see it cross scopes, but not within a scope this requires tracking of values and their order of invalidation.Unless I'm missing something, that's exactly what DIP1000 does? How else would "order of invalidation" work except through scopes? Explicit destruction? Why do that??```d mayCloseOnSpecificInput(file); file.read(); // may or may not be closed ``` Add an if/else block to it as well, well... Fun.What would an example of that look like?It's the same reason why I don't understand "Logic errors such as trying to read from an unopened file" - why would a file be unopened?Accidental, different scope states (conditional/function calls),And like goto's we don't allow you to skip variable declarations. Don't explicitly close and not handle the branches correctly. Doesn't mean we shouldn't have compiler assistance to prevent you from making this mistake.explicit closingDoctor, doctor, it hurts when...For a socket, peer closed connection. A normally occurring event, that kills the handle automaticaly., events handled automatically by OS.Example?For context: a file representation is the classic example from the 80's for type state analysis. However there was a giant thread only a couple weeks back wanting nullability type state checks. Notice how a lot of languages now have nullability analysis in them? Yeah. That would be a much better example from a practical stand point.A lot of reasons.I haven't encountered any; I'm not saying they don't exist, I'm saying that in my experience they're rare enough that they're approximately zero. I don't think that's ever happened since I stopped writing C. If one creates a file, it's open. If the file is no longer in scope, it's closed.
Apr 10
On Wednesday, 10 April 2024 at 19:47:03 UTC, Richard (Rikki) Andrew Cattermole wrote:On 11/04/2024 7:31 AM, Atila Neves wrote:Assuming you meant `destroy(*parent);` instead of destroying only the pointer to it. `destroy` runs the destructor of the object, but it doesn't free it. It even sets it to it's `.init` value so that it's known exactly what any further calls on the object will do. So your example is safe. I know, what you're saying is that one can't give ` safe` access to a field that is deallocated when the struct is destroyed even with a scoped reference, because the struct can be destroyed before the reference expires. We've probably been through this before: one can, but only inside a callback such as `borrow` for `SafeRefCounted`, that locks the data structure preventing destruction while the callback is executed.Unless I'm missing something, that's exactly what DIP1000 does? How else would "order of invalidation" work except through scopes? Explicit destruction? Why do that??See my earlier example: ```d Parent* parent; Field* child = &parent.field; parent.destroy; Field value = *child; ``` That simply shouldn't compile.
Apr 13
The live functions do type state analysis - data flow analysis is used to determine if a variable is 'live' or not. It is indeed costly to do dfa in the front end, that's one reason why it's restricted to live functions. The dfa could be extended for null checking, but in practice, null checking is not that effective: ``` class C { void xx(); } struct S { C c; } C mars(C c) { return null; } void phobos(ref S s) { C c; c.xx(); // detected mars(c).xx(); // needs whole program DFA to detect s.c.xx(); // cannot be detected } ``` This is why live functions won't work without scope, ref, and return annotations on the functions it interfaces with. live functions, like Rust, also severely limit the kinds of data structures possible. Other type state analysis currently done in D is: 1. Value Range Propagation 2. whether a field is initialized or not in a constructor is tracked
Mar 29
On 30/03/2024 6:10 AM, Walter Bright wrote:The live functions do type state analysis - data flow analysis is used to determine if a variable is 'live' or not. It is indeed costly to do dfa in the front end, that's one reason why it's restricted to live functions. The dfa could be extended for null checking, but in practice, null checking is not that effective: ``` class C { void xx(); } struct S { C c; } C mars(C c) { return null; } void phobos(ref S s) { C c; c.xx(); // detected mars(c).xx(); // needs whole program DFA to detect s.c.xx(); // cannot be detected } ``` This is why live functions won't work without scope, ref, and return annotations on the functions it interfaces with. live functions, like Rust, also severely limit the kinds of data structures possible. Other type state analysis currently done in D is: 1. Value Range Propagation 2. whether a field is initialized or not in a constructor is trackedYes, typical applications of DFA need to be run on the whole program to work as far as I'm aware. ML family compilers do this almost exclusively, they cannot do multi-step builds. D however is the opposite. Given that what I'm suggesting is not the norminal use case for DFA (quite the opposite), and we already have an approach thanks to DIP1000 I am proposing to go function by function. You must annotate the type states input and output for functions parameters and return value if the defaults are less than what you are wanting guaranteed. It acts purely in the form of verification against this. Inferring only happens within the body of a function, it does not affect the signature including for templates (not part of type system). Because of these decisions it can be parallelized without concern (unless you need to run semantic on newly inserted statements for cleanup). See my recently added example that Rust cannot check since it hasn't got type state analysis in production. https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40 ```d T* makeNull(T)() safe { return null; } void useNull() safe { int* var = makeNull!int(); // var is in type state initialized as per makeNull return state *var = 42; // segfault due to var being null } ``` What we want to happen instead: ```d T* makeNull(T)(/* return'initialized */) safe { return null; // type state default is more than the type state initialized // so it is accepted } void useNull() safe { int* var = makeNull!int(); // var is in type state initialized as per MakeNull return state // perform load via var variable // this will error due to initialized is less than the nonnull type state // Error: Variable var is in type state initialized which could be null, cannot write to it *var = 42; } ``` To fix, simply check for null! ```d void useNull() safe { int* var = makeNull!int(); // var is in type state initialized as per MakeNull return state if (var !is null) { // in scope, assume var is in type state nonnull *var = 42; } } ```
Mar 29
In order to make this work, it will need typestate annotations on all: 1. function returns 2. aggregate fields 3. array element types 4. pointed to types 5. function parameters I suspect that would make it bulletproof. Isn't it a big ask to get people to add all those annotations?
Mar 29
On 30/03/2024 3:30 PM, Walter Bright wrote:Isn't it a big ask to get people to add all those annotations?If people had to write annotations yes. The key will be making sure that they don't have to, the defaults should be good enough, or at bare minimum match what we assume today. What I am unsure about currently is related to pointers (that are not the this pointer). Should we require them to be nonnull by default and allow people to opt out like in other languages, or do we go in the inverse and go with what we have now and stick to initialized and must be opt-in to the stronger type state of nonnull. Slices are quite often null on purpose, what about them? What about consistency between safe and system? There are some unanswered questions that may be better answered after experience with it. Although I'm in favor of doing things differently than other languages and make people think about their assumptions for their code, rather than assume stronger guarantees that may not have been considered.
Mar 29