www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Idea: Ownership & lifetime escape analysis by variables in reference

reply Rikki Cattermole <alphaglosined gmail.com> writes:
Walter suggested that I create a thread for an idea I've had for 
memory ownership & lifetimes.

The basic explanation is that every variable (including 
temporaries and returns) are always in reference to other 
variables. So that the order of destruction is correct and 
nothing escapes its owner (they can be bundled unless scope is 
involved).

These examples come from a talk with Timon a couple of days ago 
on Discord. His existing dislike for this idea is that it 
conflates different levels of indirection.

My worked example:

```d
int[] array = [1, 2, 3]; // ownership: []
// arg ownership: [] (not ref/out)
// return ownership: [first argument] -> [array]
int* got = func(array); // ownership: [array]

int* func(int[] source) {
   int[] slice = source[1 .. 2]; // ownership: [source]
   int* ptr = &slice[0]; // ownership: [slice, source] -> [source]
   return ptr; // ownership: [ptr] -> [source]
}
```

A question Timon came up with that I answered:

```d
int[][] func(int[] a, int[] b){
     return [a,b];
}
```

(for return value): ``// ownership: [first argument, second 
argument]``

This ticks a lot of the boxes I'm using as preferable acceptance 
criteria:

1. Inferred for anything non-virtual
2. Nothing outlives its owner
3. Fairly straight forward to understand
May 28
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Sunday, 29 May 2022 at 00:22:03 UTC, Rikki Cattermole wrote:
 Walter suggested that I create a thread for an idea I've had 
 for memory ownership & lifetimes.

 The basic explanation is that every variable (including 
 temporaries and returns) are always in reference to other 
 variables. So that the order of destruction is correct and 
 nothing escapes its owner (they can be bundled unless scope is 
 involved).

 These examples come from a talk with Timon a couple of days ago 
 on Discord. His existing dislike for this idea is that it 
 conflates different levels of indirection.

 My worked example:

 ```d
 int[] array = [1, 2, 3]; // ownership: []
 // arg ownership: [] (not ref/out)
 // return ownership: [first argument] -> [array]
 int* got = func(array); // ownership: [array]

 int* func(int[] source) {
   int[] slice = source[1 .. 2]; // ownership: [source]
   int* ptr = &slice[0]; // ownership: [slice, source] -> 
 [source]
   return ptr; // ownership: [ptr] -> [source]
 }
 ```

 A question Timon came up with that I answered:

 ```d
 int[][] func(int[] a, int[] b){
     return [a,b];
 }
 ```

 (for return value): ``// ownership: [first argument, second 
 argument]``

 This ticks a lot of the boxes I'm using as preferable 
 acceptance criteria:

 1. Inferred for anything non-virtual
 2. Nothing outlives its owner
 3. Fairly straight forward to understand
Yes, this is a good start. You have one major missing point: you need something to transfers ownership or you end up very limited. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/msr-tr-2012-79.pdf All we need is one god damn type qualifier and we make DIP1000, live, RC object, nogc exception and couple of other long standing problem a thing of the past.
May 28
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 29/05/2022 12:26 PM, deadalnix wrote:
 Yes, this is a good start. You have one major missing point: you need 
 something to transfers ownership or you end up very limited.
Can you please write up an example similar to what I wrote, so that we can see what would be added (my mind just go blank on that paper)?
May 28
prev sibling parent drug <drug2004 bk.ru> writes:
On 29.05.2022 03:26, deadalnix wrote:
 Yes, this is a good start. You have one major missing point: you need 
 something to transfers ownership or you end up very limited.
 
 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/0
/msr-tr-2012-79.pdf 
 
 
 All we need is one god damn type qualifier and we make DIP1000,  live, 
 RC object, nogc exception and couple of other long standing problem a 
 thing of the past.
You mean `isolated`? Funny but we almost have this type qualifier - in fact it is opposite/antonym to `shared`, I guess.
May 29
prev sibling next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 29 May 2022 at 00:22:03 UTC, Rikki Cattermole wrote:
 Walter suggested that I create a thread for an idea I've had 
 for memory ownership & lifetimes.

 The basic explanation is that every variable (including 
 temporaries and returns) are always in reference to other 
 variables. So that the order of destruction is correct and 
 nothing escapes its owner (they can be bundled unless scope is 
 involved).
Maybe look at the basic ideas for static analysis? Then you will know the limitations. How do you deal with conditionals? How do you deal with mutable arrays of pointers? Simple stuff like a queue of size two? How do you know what it contains? With lifetimes you can be pessimitic: max lifetime of anything that could in theory be put into it unless I am able to prove otherwise. Then you dont have to try to prove something unless it is needed. Better to just add ARC and do all these things as optimizations.
May 29
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 29 May 2022 at 07:42:29 UTC, Ola Fosheim Grøstad wrote:
 Maybe look at the basic ideas for static analysis? Then you 
 will know the limitations.
Think of it like this: If you try to be precise then parts of it has to happen at runtime. If you want to do it at compile time then you have to overapproximate (be more pessimistic than the possible runtime situations).
May 29
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 29/05/2022 7:57 PM, Ola Fosheim Grøstad wrote:
 On Sunday, 29 May 2022 at 07:42:29 UTC, Ola Fosheim Grøstad wrote:
 Maybe look at the basic ideas for static analysis? Then you will know 
 the limitations.
Think of it like this: If you try to be precise then parts of it has to happen at runtime. If you want to do it at compile time then you have to overapproximate (be more pessimistic than the possible runtime situations).
My goals for this was to be a very cheap set of checks at compile time, that can catch most naive wrong life time issues, while being easily explained. I'm not concerned too much about preciseness as long as the false positives are low (or at least lower than DIP1000).
May 29
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 29 May 2022 at 08:14:18 UTC, rikki cattermole wrote:
 I'm not concerned too much about preciseness as long as the 
 false positives are low (or at least lower than DIP1000).
It cannot both be simple and low on false positives. Anyway, I dont really like the concept of DIP1000. It basically means that any required change to a function can lead to a situation where one has to slap trusted on it.
May 29
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 29/05/2022 8:50 PM, Ola Fosheim Grøstad wrote:
 On Sunday, 29 May 2022 at 08:14:18 UTC, rikki cattermole wrote:
 I'm not concerned too much about preciseness as long as the false 
 positives are low (or at least lower than DIP1000).
It cannot both be simple and low on false positives. Anyway, I dont really like the concept of DIP1000. It basically means that any required change to a function can lead to a situation where one has to slap trusted on it.
That's how this originally came up on Discord. I dislike DIP1000 too. We need to go back to the drawing board it isn't archiving what it should be.
May 29
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 29 May 2022 at 08:52:55 UTC, rikki cattermole wrote:
 We need to go back to the drawing board it isn't archiving what 
 it should be.
Yes, trusted is ok when you can limit it to some specific module/class, but if you end up sprinkling it around then there is no point to safe anymore.
May 29
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 29/05/2022 9:04 PM, Ola Fosheim Grøstad wrote:
 On Sunday, 29 May 2022 at 08:52:55 UTC, rikki cattermole wrote:
 We need to go back to the drawing board it isn't archiving what it 
 should be.
Yes, trusted is ok when you can limit it to some specific module/class, but if you end up sprinkling it around then there is no point to safe anymore.
Worse of all, you end up missing all of the things safe can actually catch. Unless its a wrapper around actual logic, it always worries me that I haven't got that extra guard each time I have to add it.
May 29
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 29 May 2022 at 09:21:16 UTC, rikki cattermole wrote:
 Worse of all, you end up missing all of the things  safe can 
 actually catch.
Yes, and that is why the idea of a simple solution doesn't work. I seldom make mistakes in functions without loops. safe is most valuable in functions that are "complicated" (containing loops or other intricacies). Some overhead in safe is ok, because most applications don't need to be faster than Java in 95% of the code. So if you make 95% safe then you only need to be careful with the remaining 5%. Which is kinda how it works with C now, people write C libraries and use those with high level languages. But it is more convenient with one unified language (and that could make D attractive for Linux GUI applications).
May 29
prev sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 29 May 2022 at 08:52:55 UTC, rikki cattermole wrote:
 I dislike DIP1000 too. We need to go back to the drawing board 
 it isn't archiving what it should be.
I hope I didn't sound too negative, as I am 100% supportive of what you have said in other treads about ARC or adding constraints that enable a more advanced GC. (Or possibly my idea of having ARC for shared and GC for non-shared). It is just not fruitful to try to make system-type code safe as that means Rust or beyond, and those that want that have already moved to Rust. And even if D could do the same as Rust then it would take 15 years to get there and then we compete with next gen static analysis for C++33, Rust 2.0 and Go 3.0… And if D is only going halfway to where Rust is then there will be endless complaints and people will just switch over to Rust to benefit from their ecosystem. So this angle is a *big drain* on language evolution. D could achieve so much more by providing some cool features for making it easier to write trusted code that actually can be trusted. All this work people talk about regarding static analysis and advanced type system features could be done as optimizations on a simpler stable language. With higher payoff.
May 29
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 30/05/2022 12:01 AM, Ola Fosheim Grøstad wrote:
 It is just not fruitful to try to make  system-type code  safe
In my snippets here, I've kinda ignored how it should be turned on. I did mention elsewhere that I previously used scope as a type qualifier to do this. Which should of course work in system, but could always be cast away. Just like shared, const or immutable. In general I am in agreement, these sort of safety features shouldn't be opted out of, instead opted into only. Otherwise you're breaking a ton of code without any benefit
May 29
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 29 May 2022 at 12:11:01 UTC, rikki cattermole wrote:
 I did mention elsewhere that I previously used scope as a type 
 qualifier to do this.
I guess what I am saying is something like this: I think you would get better yield for your time investment by pursuing how to add barriers and figuring out how one could write trusted code in such a landscape than anything related to advanced escape analysis in the context of verification of *D code* (optimization is ok, because then you will choose a best effort strategy, verification is all-or-nothing). To understand what is needed for solid escape analysis, let us take the two-element queue example. You could unroll the loop. Replace ```a[0]``` with ```a0```, ```a[1]``` with ```a1``` and all ```a[i]``` accesses with a conditional (or switch). Then analyse this with an algorithm/solver that can deal with branches. Or something along these lines. A lot of machinery even for the simplest datastructure! And then we have the slice scenarios with aliasing, where modifying the content of one slice could modify basically any other parameter. It would be so much easier to just add stack as an optimization hint where you want something to be put on the stack, and let the compiler generate a report of where it failed, why and how to improve the code. (For DIP1000 to make sense it must be a able to deal with more complexity than most of your code, otherwise it will break down whenever you need to modify existing code.)
May 29
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 30/05/2022 12:40 AM, Ola Fosheim Grøstad wrote:
 On Sunday, 29 May 2022 at 12:11:01 UTC, rikki cattermole wrote:
 I did mention elsewhere that I previously used scope as a type 
 qualifier to do this.
I guess what I am saying is something like this: I think you would get better yield for your time investment by pursuing how to add barriers and figuring out how one could write trusted code in such a landscape than anything related to advanced escape analysis in the context of verification of *D code* (optimization is ok, because then you will choose a best effort strategy, verification is all-or-nothing).
I'm basically maxed out on this like a year ago. I've only been bringing it up as an example of an alternative to DIP1000 and friends. To show that we could be far closer to a useful solution than what we have now which is certainly not hitting its marks. Write barriers should be a mere glue code problem. That doesn't need a DIP or anything else, just someone who knows the code bases to do it! It would be great to have alternative designs, but whatever is come up with, I think a key design decision is going to be the differentiation between dependency and complimentary of runtime mechanisms. I know you've been wanting more on the dependency side of decisions, whereas I'm more interested in complimentary since I have to stick to things like -betterC.
May 29
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 29 May 2022 at 12:57:01 UTC, rikki cattermole wrote:
 side of decisions, whereas I'm more interested in complimentary 
 since I have to stick to things like -betterC.
ARC could work with -betterC.
May 29
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 30/05/2022 1:04 AM, Ola Fosheim Grøstad wrote:
 On Sunday, 29 May 2022 at 12:57:01 UTC, rikki cattermole wrote:
 side of decisions, whereas I'm more interested in complimentary since 
 I have to stick to things like -betterC.
ARC could work with -betterC.
I'm convinced for structs and classes, that's actually a DIP I would like to write. I have written a bit up about what it would be[0]. Without boxing into fat types, I don't see how its going to be possible. The boxing I'm really not happy about, as that is changing known type behaviors and properties which is concerning to me. But hey, I'd love to be proven wrong (not that I've thought about this much)! [0] https://gist.github.com/rikkimax/95994607813d6665b6adbfb36d180c42#file-opref-md
May 29
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 29 May 2022 at 13:11:58 UTC, rikki cattermole wrote:
 But hey, I'd love to be proven wrong (not that I've thought 
 about this much)!
I mean, you could just start by optimizing C++ shared_ptr… Even that would be valuable in the context of -betterC.
May 29
prev sibling next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 29/05/2022 7:42 PM, Ola Fosheim Grøstad wrote:
 Maybe look at the basic ideas for static analysis? Then you will know 
 the limitations.
 
 How do you deal with conditionals?
```d Thing func(Thing a, Thing b, bool pickB) { return pickB ? a : b; // ownership: [a, b] } ```
 How do you deal with mutable arrays of pointers?
```d void func(Thing a, Thing b) { int[] arr; // ownership: [] arr ~= a; // arr ownership: [a] arr ~= b; // arr ownership: [a, b] arr = arr[0 .. 1]; // arr ownership: [a, b] arr ~= b; // arr ownership: [a, b] } ```
 Simple stuff like a queue of size two? How do you know what it contains?
See above
 With lifetimes you can be pessimitic: max lifetime of anything that 
 could in theory be put into it unless I am able to prove otherwise. Then 
 you dont have to try to prove something unless it is needed.
 
 Better to just add ARC and do all these things as optimizations.
I too would like to have proper reference counting on structs and classes with optimizations, but the compiler has to know what object to do the reference counting on! Right now things can escape reference counting.
May 29
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 29 May 2022 at 08:03:57 UTC, rikki cattermole wrote:
 On 29/05/2022 7:42 PM, Ola Fosheim Grøstad wrote:
 Simple stuff like a queue of size two? How do you know what it 
 contains?
See above
You will end up being too pessimistic. For instance: if the queue is empty you will assume it is tainted by eveything that went through it... To get to a good place you have to do a good overapproximation.
 I too would like to have proper reference counting on structs 
 and classes with optimizations, but the compiler has to know 
 what object to do the reference counting on! Right now things 
 can escape reference counting.
Impose more limits on safe. Then give trusted and system wrappers and other constructs that allows them to give safe code what it needs. A fat pointer can track ownership when you address fields. Then get rid of fat pointers by optimization.
May 29
prev sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 29/05/2022 7:42 PM, Ola Fosheim Grøstad wrote:
 Simple stuff like a queue of size two? How do you know what it contains?
I've been thinking back on my past thought experiments, how I use to turn this on was with scope as a type qualifier. For non-owning data structures like queues, you would never intentionally mark that as scope since its entries will always have separate lifetimes from that of the data structure itself. So even if implemented, this may not be a general solution that we are looking for.
May 29
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/28/2022 5:22 PM, Rikki Cattermole wrote:
 1. Inferred for anything non-virtual
dip1000 infers `scope` and `return` for: 1. templates 2. lambdas 3. nested functions 4. generated functions i.e. where the source to the function is there. But this isn't so for: int* foo(int*); where the declaration of foo() is separated (maybe in another file) from its definition. Your proposal requires all the source to be available, and no "header" declarations. It also won't work for pointers to functions and delegates. To make those work will require annotations - the same reason D has `scope` and `return` annotations.
May 29
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 30/05/2022 8:26 AM, Walter Bright wrote:
 where the declaration of foo() is separated (maybe in another file) from 
 its definition.
 
 Your proposal requires all the source to be available, and no "header" 
 declarations. It also won't work for pointers to functions and delegates.
 
 To make those work will require annotations - the same reason D has 
 `scope` and `return` annotations.
Indeed. I don't think you can get around having annotations for lifetime stuff for function pointers or virtual methods. You simply are going to need them. Here is a common bit of code I've got: ```d struct Thing { safe nothrow nogc: StringBuilder_ASCII toString(RCAllocator allocator = globalAllocator()) trusted scope { StringBuilder_ASCII ret = StringBuilder_ASCII(allocator); ... return ret; } } ``` The fact that it is called toString is immaterial (just want to point that out). There are two sets of function annotations, at the struct scope level and at the method level. Scope on the method should actually be at the struct level, its not due to[0]. So we can ignore that one as it is unrelated to the problem at hand. But the trusted on the other hand, that is DIP1000 related. Since without it, DIP1000 is seeing ret and even though the this pointer and anything in it, is not stored in anyway inside of ret, ret is inferred as scope and cannot be returned. While I'm sure we can special case this to get it to work, this is a pretty good example of where things that should be working right now, are not. I know in my original post I didn't talk about how to turn this sort of thing on, in the past I used scope as a type qualifier to do it in my thought experiments. Which would work in all D code regardless of function annotations. [0] https://issues.dlang.org/show_bug.cgi?id=23142
May 29
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/29/2022 4:56 PM, rikki cattermole wrote:
 [0] https://issues.dlang.org/show_bug.cgi?id=23142
Thanks for the report, but I don't understand what it has to do with the rest of your post?
May 29
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 30/05/2022 2:00 PM, Walter Bright wrote:
 On 5/29/2022 4:56 PM, rikki cattermole wrote:
 [0] https://issues.dlang.org/show_bug.cgi?id=23142
Thanks for the report, but I don't understand what it has to do with the rest of your post?
Its basically the only non-design related issue I've encountered when working with compiler guarantees in the last year or so. So its worth the mention since it only comes up because I'm using DIP1000 with all the attribute soup that goes along with it.
May 30