digitalmars.dip.ideas - Type state analysis

Richard (Rikki) Andrew Cattermole (40/40) Mar 16 This proposal introduces a new semantic pass, that uses an IR to

claptrap (3/6) Mar 17 Any examples of languages that use this and what it provides?

Richard (Rikki) Andrew Cattermole (12/19) Mar 17 into mainstream programming languages. However, many academic projects

Paul Backus (14/18) Mar 17 Yet another proposal for a humongous language feature to solve a

Atila Neves (2/17) Apr 08 I second all of the above.

Richard (Rikki) Andrew Cattermole (15/36) Apr 08 Just so we are clear, DIP1000 is not able to provide memory safety on a

Atila Neves (17/49) Apr 10 DIP1000 depends on lifetimes and RAII, and in a language like D

Richard (Rikki) Andrew Cattermole (7/60) Apr 10 Yes, because of the explicit scope creation. The compiler can see it

Atila Neves (13/75) Apr 10 Unless I'm missing something, that's exactly what DIP1000 does?

Richard (Rikki) Andrew Cattermole (37/65) Apr 10 See my earlier example:

Dukc (16/28) Apr 13 Assuming you meant `destroy(*parent);` instead of destroying only

Walter Bright (24/24) Mar 29 The @live functions do type state analysis - data flow analysis is used ...

Richard (Rikki) Andrew Cattermole (58/92) Mar 29 Yes, typical applications of DFA need to be run on the whole program to

Walter Bright (8/8) Mar 29 In order to make this work, it will need typestate annotations on all:

Richard (Rikki) Andrew Cattermole (17/18) Mar 29 If people had to write annotations yes.

Richard (Rikki) Andrew Cattermole <richard cattermole.co.nz> writes:

This proposal introduces a new semantic pass, that uses an IR to
analyse memory transfers and the different states it can be in.

It is the bleading edge on program security, that would enable
other memory analysis solutions such as isolated (immutable
references ala Midori).

It enables:

- To prevent reads to uninitialized variables
- Prevent reads, writes, method calls to null objects
- Logic errors such as trying to read from an unopened file
- Is a framework to analyse unreachable vs reachable variables

It does not affect the type system.
Additional information must be provided by the user if they want
to go beyond the defaults.
The default type state of each variable is meant to be as close
to provable by the compiler, with user assistance like null
checks.

A key premise is that newer edition `` safe`` function will not
be able to call the older `` safe`` function.
This drastically simplifies decisions.

The end goal of this is to enable D to become temporally safe,
which is critical to D's future (see the recent US government
report on memory safety with specific mention towards temporal
safety).
Without this, the framework required to do temporal analysis does
not exist, and what would otherwise be very clear decisions
become arguable at best, at worst complete unknowns.

For each variable declaration I am proposing a default type
state, along with manual specification of the input and output
states it may be in by using the UAX31 Medial character
``identifier'input'output``.

This proposal was born out of necessaity after failure to solve
issues surrounding isolated, and it too required a similar IR to
be built.

I recognize that this could be a bit costly, but due to the
potential performance wins as well as memory and logic safety,
this is an absolute must if we want to be competitive.

Latest:
https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40
Current:
https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40/a8fffb5725904c6f5d74052d9c974a8f5d453fb0

Mar 16

claptrap <clap trap.com> writes:

On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 This proposal introduces a new semantic pass, that uses an IR 
 to analyse memory transfers and the different states it can be 
 in.

Any examples of languages that use this and what it provides?

Mar 17

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 18/03/2024 12:25 AM, claptrap wrote:
 On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 This proposal introduces a new semantic pass, that uses an IR to 
 analyse memory transfers and the different states it can be in.

 
 Any examples of languages that use this and what it provides?

 From Wikipedia:

 Typestate is an experimental concept that has not yet crossed over 

into mainstream programming languages. However, many academic projects 
actively investigate how to make it more useful as an everyday 
programming technique. Two examples are the Plaid and Obsidian 
languages, which are being developed by Jonathan Aldrich's group at 
Carnegie Mellon University.[16][17] Other examples include the Clara[18] 
language research framework, earlier versions of the Rust language, and 
the >> keyword in ATS.[19]

This is the bleeding edge of research.

https://en.wikipedia.org/wiki/Typestate_analysis

If we get this, we'll be writing the literature.

Mar 17

Paul Backus <snarwin gmail.com> writes:

On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 Latest: 
 https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40
 Current: 
 https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40/a8fffb5725904c6f5d74052d9c974a8f5d453fb0

Yet another proposal for a humongous language feature to solve a 
problem that can already be solved without it.

It is possible to achieve temporal safety in D already with 
`scope` (DIP 1000) and ` system` variables (DIP 1035). The 
ergonomics are not great, but they can be improved (e.g., with 
built-in move-on-last-use).

This proposal specifically has the same problem as ` live`: it 
splits ` safe` code into separate "new ` safe`" and "old ` safe`" 
dialects, which are mutually incompatible.

The ideas themselves are not terrible. I would be interested to 
see what this looks like implemented in a research language. But 
I do not think it has any place in D.

Mar 17

Atila Neves <atila.neves gmail.com> writes:

On Sunday, 17 March 2024 at 15:35:40 UTC, Paul Backus wrote:
 On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 [...]

 Yet another proposal for a humongous language feature to solve 
 a problem that can already be solved without it.

 It is possible to achieve temporal safety in D already with 
 `scope` (DIP 1000) and ` system` variables (DIP 1035). The 
 ergonomics are not great, but they can be improved (e.g., with 
 built-in move-on-last-use).

 This proposal specifically has the same problem as ` live`: it 
 splits ` safe` code into separate "new ` safe`" and "old 
 ` safe`" dialects, which are mutually incompatible.

 The ideas themselves are not terrible. I would be interested to 
 see what this looks like implemented in a research language. 
 But I do not think it has any place in D.

I second all of the above.

Apr 08

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 09/04/2024 6:18 AM, Atila Neves wrote:
 On Sunday, 17 March 2024 at 15:35:40 UTC, Paul Backus wrote:
 On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 [...]

 Yet another proposal for a humongous language feature to solve a 
 problem that can already be solved without it.

 It is possible to achieve temporal safety in D already with `scope` 
 (DIP 1000) and ` system` variables (DIP 1035). The ergonomics are not 
 great, but they can be improved (e.g., with built-in move-on-last-use).

 This proposal specifically has the same problem as ` live`: it splits 
 ` safe` code into separate "new ` safe`" and "old ` safe`" dialects, 
 which are mutually incompatible.

 The ideas themselves are not terrible. I would be interested to see 
 what this looks like implemented in a research language. But I do not 
 think it has any place in D.

 
 I second all of the above.

Just so we are clear, DIP1000 is not able to provide memory safety on a 
single thread, let alone multiple.

It is missing the child/parent relationship to prevent:

```d
Parent* parent;
Field* child = &parent.field;
parent.destroy;
Field value = *child;
```

This happens to be one of the things that the DFA is capable of 
tracking, that is not currently possible to implement (Dennis has tried).

It also does not  provide any way of tracking memory from creation to 
ensure it can be transferred safely to another thread in its entirety to 
prevent needing things like locks.

Apr 08

Atila Neves <atila.neves gmail.com> writes:

On Monday, 8 April 2024 at 20:10:24 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 On 09/04/2024 6:18 AM, Atila Neves wrote:
 On Sunday, 17 March 2024 at 15:35:40 UTC, Paul Backus wrote:
 On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 [...]

 Yet another proposal for a humongous language feature to 
 solve a problem that can already be solved without it.

 It is possible to achieve temporal safety in D already with 
 `scope` (DIP 1000) and ` system` variables (DIP 1035). The 
 ergonomics are not great, but they can be improved (e.g., 
 with built-in move-on-last-use).

 This proposal specifically has the same problem as ` live`: 
 it splits ` safe` code into separate "new ` safe`" and "old 
 ` safe`" dialects, which are mutually incompatible.

 The ideas themselves are not terrible. I would be interested 
 to see what this looks like implemented in a research 
 language. But I do not think it has any place in D.

 
 I second all of the above.

 Just so we are clear, DIP1000 is not able to provide memory 
 safety on a single thread, let alone multiple.

 It is missing the child/parent relationship to prevent:

 ```d
 Parent* parent;
 Field* child = &parent.field;
 parent.destroy;
 Field value = *child;
 ```

DIP1000 depends on lifetimes and RAII, and in a language like D 
with constructors and destructors I struggle to find a reason to 
write `parent.destroy`. This works as intended, in the sense that 
it fails to compile:

```
     Field* child;
     {
         Parent parent;
         child = &parent.field;
     }
     Field value = *child;
```

It's the same reason why I don't understand "Logic errors such as 
trying to read from an unopened file" - why would a file be 
unopened?

Apr 10

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 11/04/2024 7:17 AM, Atila Neves wrote:
 On Monday, 8 April 2024 at 20:10:24 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 On 09/04/2024 6:18 AM, Atila Neves wrote:
 On Sunday, 17 March 2024 at 15:35:40 UTC, Paul Backus wrote:
 On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 [...]

 Yet another proposal for a humongous language feature to solve a 
 problem that can already be solved without it.

 It is possible to achieve temporal safety in D already with `scope` 
 (DIP 1000) and ` system` variables (DIP 1035). The ergonomics are 
 not great, but they can be improved (e.g., with built-in 
 move-on-last-use).

 This proposal specifically has the same problem as ` live`: it 
 splits ` safe` code into separate "new ` safe`" and "old ` safe`" 
 dialects, which are mutually incompatible.

 The ideas themselves are not terrible. I would be interested to see 
 what this looks like implemented in a research language. But I do 
 not think it has any place in D.

 I second all of the above.

 Just so we are clear, DIP1000 is not able to provide memory safety on 
 a single thread, let alone multiple.

 It is missing the child/parent relationship to prevent:

 ```d
 Parent* parent;
 Field* child = &parent.field;
 parent.destroy;
 Field value = *child;
 ```

 
 DIP1000 depends on lifetimes and RAII, and in a language like D with 
 constructors and destructors I struggle to find a reason to write 
 `parent.destroy`. This works as intended, in the sense that it fails to 
 compile:
 
 ```
      Field* child;
      {
          Parent parent;
          child = &parent.field;
      }
      Field value = *child;
 ```

Yes, because of the explicit scope creation. The compiler can see it 
cross scopes, but not within a scope this requires tracking of values 
and their order of invalidation.

 It's the same reason why I don't understand "Logic errors such as trying 
 to read from an unopened file" - why would a file be unopened?

Accidental, different scope states (conditional/function calls), 
explicit closing, events handled automatically by OS.

A lot of reasons.

Apr 10

Atila Neves <atila.neves gmail.com> writes:

On Wednesday, 10 April 2024 at 19:23:17 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 On 11/04/2024 7:17 AM, Atila Neves wrote:
 On Monday, 8 April 2024 at 20:10:24 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 09/04/2024 6:18 AM, Atila Neves wrote:
 On Sunday, 17 March 2024 at 15:35:40 UTC, Paul Backus wrote:
 On Sunday, 17 March 2024 at 06:51:23 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 [...]

 Yet another proposal for a humongous language feature to 
 solve a problem that can already be solved without it.

 It is possible to achieve temporal safety in D already with 
 `scope` (DIP 1000) and ` system` variables (DIP 1035). The 
 ergonomics are not great, but they can be improved (e.g., 
 with built-in move-on-last-use).

 This proposal specifically has the same problem as ` live`: 
 it splits ` safe` code into separate "new ` safe`" and "old 
 ` safe`" dialects, which are mutually incompatible.

 The ideas themselves are not terrible. I would be 
 interested to see what this looks like implemented in a 
 research language. But I do not think it has any place in D.

 I second all of the above.

 Just so we are clear, DIP1000 is not able to provide memory 
 safety on a single thread, let alone multiple.

 It is missing the child/parent relationship to prevent:

 ```d
 Parent* parent;
 Field* child = &parent.field;
 parent.destroy;
 Field value = *child;
 ```

 
 DIP1000 depends on lifetimes and RAII, and in a language like 
 D with constructors and destructors I struggle to find a 
 reason to write `parent.destroy`. This works as intended, in 
 the sense that it fails to compile:
 
 ```
      Field* child;
      {
          Parent parent;
          child = &parent.field;
      }
      Field value = *child;
 ```

 Yes, because of the explicit scope creation. The compiler can 
 see it cross scopes, but not within a scope this requires 
 tracking of values and their order of invalidation.

Unless I'm missing something, that's exactly what DIP1000 does? 
How else would "order of invalidation" work except through 
scopes? Explicit destruction? Why do that??

 It's the same reason why I don't understand "Logic errors such 
 as trying to read from an unopened file" - why would a file be 
 unopened?

 Accidental, different scope states (conditional/function calls),

What would an example of that look like?

 explicit closing

Doctor, doctor, it hurts when...

, events handled automatically by OS.

Example?

 A lot of reasons.

I haven't encountered any; I'm not saying they don't exist, I'm 
saying that in my experience they're rare enough that they're 
approximately zero. I don't think that's ever happened since I 
stopped writing C. If one creates a file, it's open. If the file 
is no longer in scope, it's closed.

Apr 10

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 11/04/2024 7:31 AM, Atila Neves wrote:
 On Wednesday, 10 April 2024 at 19:23:17 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 Yes, because of the explicit scope creation. The compiler can see it 
 cross scopes, but not within a scope this requires tracking of values 
 and their order of invalidation.

 
 Unless I'm missing something, that's exactly what DIP1000 does? How else 
 would "order of invalidation" work except through scopes? Explicit 
 destruction? Why do that??

See my earlier example:

```d
Parent* parent;
Field* child = &parent.field;
parent.destroy;
Field value = *child;
```

That simply shouldn't compile.

One single scope, and it absolutely should be under DIP1000's purview.

This is critical for owning data structures (which can include memory 
allocators) for it to be solved.

Note: destruction also includes assignments to the variable that had 
contained the data structure.

```d
DS ds = ...;
Item* item = ds[...];
ds = ...;
Item value = *item;
```

Same exact pattern.

 It's the same reason why I don't understand "Logic errors such as 
 trying to read from an unopened file" - why would a file be unopened?

 Accidental, different scope states (conditional/function calls),

 
 What would an example of that look like?

```d
mayCloseOnSpecificInput(file);

file.read(); // may or may not be closed
```

Add an if/else block to it as well, well... Fun.

 explicit closing

 
 Doctor, doctor, it hurts when...

And like goto's we don't allow you to skip variable declarations.

Don't explicitly close and not handle the branches correctly.

Doesn't mean we shouldn't have compiler assistance to prevent you from 
making this mistake.

 , events handled automatically by OS.

 
 Example?

For a socket, peer closed connection. A normally occurring event, that 
kills the handle automaticaly.

 A lot of reasons.

 
 I haven't encountered any; I'm not saying they don't exist, I'm saying 
 that in my experience they're rare enough that they're approximately 
 zero. I don't think that's ever happened since I stopped writing C. If 
 one creates a file, it's open. If the file is no longer in scope, it's 
 closed.

For context: a file representation is the classic example from the 80's 
for type state analysis. However there was a giant thread only a couple 
weeks back wanting nullability type state checks.

Notice how a lot of languages now have nullability analysis in them? 
Yeah. That would be a much better example from a practical stand point.

Apr 10

Dukc <ajieskola gmail.com> writes:

On Wednesday, 10 April 2024 at 19:47:03 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 On 11/04/2024 7:31 AM, Atila Neves wrote:
 Unless I'm missing something, that's exactly what DIP1000 
 does? How else would "order of invalidation" work except 
 through scopes? Explicit destruction? Why do that??

 See my earlier example:

 ```d
 Parent* parent;
 Field* child = &parent.field;
 parent.destroy;
 Field value = *child;
 ```

 That simply shouldn't compile.

Assuming you meant `destroy(*parent);` instead of destroying only 
the pointer to it.

`destroy` runs the destructor of the object, but it doesn't free 
it. It even sets it to it's `.init` value so that it's known 
exactly what any further calls on the object will do. So your 
example is safe.

I know, what you're saying is that one can't give ` safe` access 
to a field that is deallocated when the struct is destroyed even 
with a scoped reference, because the struct can be destroyed 
before the reference expires.

We've probably been through this before: one can, but only inside 
a callback such as `borrow` for `SafeRefCounted`, that locks the 
data structure preventing destruction while the callback is 
executed.

Apr 13

Walter Bright <newshound2 digitalmars.com> writes:

The  live functions do type state analysis - data flow analysis is used to 
determine if a variable is 'live' or not.

It is indeed costly to do dfa in the front end, that's one reason why it's 
restricted to  live functions.

The dfa could be extended for null checking, but in practice, null checking is 
not that effective:

```
class C { void xx(); }

struct S { C c; }

C mars(C c) { return null; }

void phobos(ref S s)
{
     C c;
     c.xx();       // detected
     mars(c).xx(); // needs whole program DFA to detect
     s.c.xx();     // cannot be detected
}
```

This is why  live functions won't work without scope, ref, and return 
annotations on the functions it interfaces with.  live functions, like Rust, 
also severely limit the kinds of data structures possible.

Other type state analysis currently done in D is:

1. Value Range Propagation

2. whether a field is initialized or not in a constructor is tracked

Mar 29

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 30/03/2024 6:10 AM, Walter Bright wrote:
 The  live functions do type state analysis - data flow analysis is used 
 to determine if a variable is 'live' or not.
 
 It is indeed costly to do dfa in the front end, that's one reason why 
 it's restricted to  live functions.
 
 The dfa could be extended for null checking, but in practice, null 
 checking is not that effective:
 
 ```
 class C { void xx(); }
 
 struct S { C c; }
 
 C mars(C c) { return null; }
 
 void phobos(ref S s)
 {
      C c;
      c.xx();       // detected
      mars(c).xx(); // needs whole program DFA to detect
      s.c.xx();     // cannot be detected
 }
 ```
 
 This is why  live functions won't work without scope, ref, and return 
 annotations on the functions it interfaces with.  live functions, like 
 Rust, also severely limit the kinds of data structures possible.
 
 Other type state analysis currently done in D is:
 
 1. Value Range Propagation
 
 2. whether a field is initialized or not in a constructor is tracked

Yes, typical applications of DFA need to be run on the whole program to 
work as far as I'm aware. ML family compilers do this almost 
exclusively, they cannot do multi-step builds. D however is the opposite.

Given that what I'm suggesting is not the norminal use case for DFA 
(quite the opposite), and we already have an approach thanks to DIP1000 
I am proposing to go function by function.

You must annotate the type states input and output for functions 
parameters and return value if the defaults are less than what you are 
wanting guaranteed.

It acts purely in the form of verification against this.
Inferring only happens within the body of a function, it does not affect 
the signature including for templates (not part of type system).

Because of these decisions it can be parallelized without concern 
(unless you need to run semantic on newly inserted statements for cleanup).

See my recently added example that Rust cannot check since it hasn't got 
type state analysis in production.

https://gist.github.com/rikkimax/eed86a7061445a93f214e41fb6445e40

```d
T* makeNull(T)()  safe {
     return null;
}

void useNull()  safe {
     int* var = makeNull!int();
     // var is in type state initialized as per makeNull return state

     *var = 42;
     // segfault due to var being null
}
```

What we want to happen instead:

```d
T* makeNull(T)(/* return'initialized */)  safe {
     return null;
     // type state default is more than the type state initialized
     // so it is accepted
}

void useNull()  safe {
     int* var = makeNull!int();
     // var is in type state initialized as per MakeNull return state

     // perform load via var variable
     // this will error due to initialized is less than the nonnull type 
state
     // Error: Variable var is in type state initialized which could be 
null, cannot write to it
     *var = 42;
}
```

To fix, simply check for null!

```d
void useNull()  safe {
     int* var = makeNull!int();
     // var is in type state initialized as per MakeNull return state

     if (var !is null) {
         // in scope, assume var is in type state nonnull
         *var = 42;
     }
}
```

Mar 29

Walter Bright <newshound2 digitalmars.com> writes:

In order to make this work, it will need typestate annotations on all:

1. function returns
2. aggregate fields
3. array element types
4. pointed to types
5. function parameters

I suspect that would make it bulletproof. Isn't it a big ask to get people to 
add all those annotations?

Mar 29

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 30/03/2024 3:30 PM, Walter Bright wrote:
 Isn't it a big ask to get people to add all those annotations?

If people had to write annotations yes.

The key will be making sure that they don't have to, the defaults should 
be good enough, or at bare minimum match what we assume today.

What I am unsure about currently is related to pointers (that are not 
the this pointer).

Should we require them to be nonnull by default and allow people to opt 
out like in other languages, or do we go in the inverse and go with what 
we have now and stick to initialized and must be opt-in to the stronger 
type state of nonnull.

Slices are quite often null on purpose, what about them?

What about consistency between  safe and  system?

There are some unanswered questions that may be better answered after 
experience with it. Although I'm in favor of doing things differently 
than other languages and make people think about their assumptions for 
their code, rather than assume stronger guarantees that may not have 
been considered.

Mar 29

D Programming

C/C++ Programming

Other

digitalmars.dip.ideas - Type state analysis