www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - DIP1000

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I'd like to initiate collaboration on an effort to do DIP1000 rigorously.

First we need to reduce D to a bare subset that only has integers, 
structs, pointers, and functions. That's a working subset of actual D 
code. The grammar I have in mind is at http://erdani.com/d/DIP1000.html.

There is no type deduction, member functions, classes, arrays, 
constructors, loops, etc. etc. etc. Only the ability to create 
arbitrarily complex graphs containing integers and pointers to other nodes.

On this language we need to define typing rules and evaluation 
semantics. Once we have those, we need to prove what we want: for scope 
variables the accessibility never outlives lifetime. As a consequence 
we're good to deallocate them early etc.

The model for typing, evaluation, and proofs is at 
https://www.cis.upenn.edu/~bcpierce/papers/fj-toplas.pdf. It would be 
great if those interested in helping could give the paper a close read.

Once we get this done we'll have a fantastic model for any other 
language changes.


Andrei
Aug 30 2016
next sibling parent reply Lodovico Giaretta <lodovico giaretart.net> writes:
On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu 
wrote:
 I'd like to initiate collaboration on an effort to do DIP1000 
 rigorously.

 [...]
If I may suggest, a repository with some LaTeX code may be a good idea, especially if the idea is to write things like << If gamma derives a with lifetime l, then gamma derives a' with lifetime l'
 (I mean, diagrams like the ones at page 9 of the paper). Then 
you could use subscripts/superscripts to associate lifetime to expressions in these graphs and upload on your site a nice readable pdf. By the way, do you want to formalize with the aforementioned notation the D subset? If that's the case, I'd really like to help (even if I don't have much time).
Aug 30 2016
next sibling parent reply Seb <seb wilzba.ch> writes:
On Tuesday, 30 August 2016 at 16:27:05 UTC, Lodovico Giaretta 
wrote:
 On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu 
 wrote:
 I'd like to initiate collaboration on an effort to do DIP1000 
 rigorously.

 [...]
If I may suggest, a repository with some LaTeX code may be a good idea, especially if the idea is to write things like << If gamma derives a with lifetime l, then gamma derives a' with lifetime l' >> (I mean, diagrams like the ones at page 9 of the paper). Then you could use subscripts/superscripts to associate lifetime to expressions in these graphs and upload on your site a nice readable pdf.
Okay this is a bit unrelated to the original question, but it's 2016 and we can do a lot better. It's quite easy to combine LaTeX and Markdowm to get something easy to use and yielding pretty HTML. Have a look at the sources of my blog post from last week: https://github.com/libmir/blog/blob/master/_posts/2016-08-19-transformed-density-rejection-sampling.md (As I am quite familiar with the setup and would be happy to set it up - including auto-deploy)
Aug 30 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 08/30/2016 12:39 PM, Seb wrote:
 On Tuesday, 30 August 2016 at 16:27:05 UTC, Lodovico Giaretta wrote:
 On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu wrote:
 I'd like to initiate collaboration on an effort to do DIP1000
 rigorously.

 [...]
If I may suggest, a repository with some LaTeX code may be a good idea, especially if the idea is to write things like << If gamma derives a with lifetime l, then gamma derives a' with lifetime l' >> (I mean, diagrams like the ones at page 9 of the paper). Then you could use subscripts/superscripts to associate lifetime to expressions in these graphs and upload on your site a nice readable pdf.
Okay this is a bit unrelated to the original question, but it's 2016 and we can do a lot better. It's quite easy to combine LaTeX and Markdowm to get something easy to use and yielding pretty HTML. Have a look at the sources of my blog post from last week: https://github.com/libmir/blog/blob/master/_posts/2016-08-19-transformed-density-rejection-sampling.md (As I am quite familiar with the setup and would be happy to set it up - including auto-deploy)
Isn't it amazing how fast we got to a debate on choosing tools... :o) -- Andrei
Aug 30 2016
parent reply Seb <seb wilzba.ch> writes:
On Tuesday, 30 August 2016 at 17:46:41 UTC, Andrei Alexandrescu 
wrote:
 On 08/30/2016 12:39 PM, Seb wrote:
 On Tuesday, 30 August 2016 at 16:27:05 UTC, Lodovico Giaretta 
 wrote:
 On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei 
 Alexandrescu wrote:
 I'd like to initiate collaboration on an effort to do DIP1000
 rigorously.

 [...]
If I may suggest, a repository with some LaTeX code may be a good idea, especially if the idea is to write things like << If gamma derives a with lifetime l, then gamma derives a' with lifetime l' >> (I mean, diagrams like the ones at page 9 of the paper). Then you could use subscripts/superscripts to associate lifetime to expressions in these graphs and upload on your site a nice readable pdf.
Okay this is a bit unrelated to the original question, but it's 2016 and we can do a lot better. It's quite easy to combine LaTeX and Markdowm to get something easy to use and yielding pretty HTML. Have a look at the sources of my blog post from last week: https://github.com/libmir/blog/blob/master/_posts/2016-08-19-transformed-density-rejection-sampling.md (As I am quite familiar with the setup and would be happy to set it up - including auto-deploy)
Isn't it amazing how fast we got to a debate on choosing tools... :o) -- Andrei
Sorry, please ignore, but it would still be nice to put it on Github, so that we can make pull requests. A couple of nits from a first pass: - ArgumentList is not defined -> ParameterList - The parameters is limited to 0,1,2 - on purpose? (opt -> * would allow arbitrary numbers of parameters) - In contrast to FeatherweightJava the grammar allows a lot of weird stuff like: ref int** foo { 42 = 42; return null; *42 = &42; }
Aug 30 2016
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 30.08.2016 21:27, Seb wrote:
 Sorry, please ignore, but it would still be nice to put it on Github, so
 that we can make pull requests. A couple of nits from a first pass:

 - ArgumentList is not defined -> ParameterList
No, it should be ArgumentList, which is a comma-separated list of expressions.
 - The parameters is limited to 0,1,2 - on purpose? (opt -> * would allow
 arbitrary numbers of parameters)
 - In contrast to FeatherweightJava the grammar allows a lot of weird
 stuff like:

 ref int** foo {
Parentheses are required by the grammar, no?
    42 = 42;
    return null;
    *42 = &42;
 }
Those would be ruled out by type checking.
Aug 30 2016
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 08/30/2016 12:27 PM, Lodovico Giaretta wrote:
 On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu wrote:
 I'd like to initiate collaboration on an effort to do DIP1000 rigorously.

 [...]
If I may suggest, a repository with some LaTeX code may be a good idea, especially if the idea is to write things like << If gamma derives a with lifetime l, then gamma derives a' with lifetime l' >> (I mean, diagrams like the ones at page 9 of the paper). Then you could use subscripts/superscripts to associate lifetime to expressions in these graphs and upload on your site a nice readable pdf.
Good idea. In fact I'm using a .dd file right now, which can be purposed to generate latex :o).
 By the way, do you want to formalize with the aforementioned notation
 the D subset? If that's the case, I'd really like to help (even if I
 don't have much time).
Correct. Awesome! Andrei
Aug 30 2016
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 30.08.2016 18:12, Andrei Alexandrescu wrote:
 I'd like to initiate collaboration on an effort to do DIP1000 rigorously.

 First we need to reduce D to a bare subset that only has integers,
 structs, pointers, and functions. That's a working subset of actual D
 code. The grammar I have in mind is at http://erdani.com/d/DIP1000.html.
 ...
- Missing a production rule for ArgumentList. - There is no ParameterList of length three or more. - 'return' annotations missing.
 There is no type deduction, member functions, classes, arrays,
 constructors, loops, etc. etc. etc. Only the ability to create
 arbitrarily complex graphs containing integers and pointers to other nodes.

 On this language we need to define typing rules and evaluation
 semantics. Once we have those, we need to prove what we want: for scope
 variables the accessibility never outlives lifetime. As a consequence
 we're good to deallocate them early etc.
 ...
I'd just actually deallocate the memory slots when their lifetime ends and then prove type safety. (This implies in particular that the accessed addressed are still allocated for all dereferencing expressions.)
 The model for typing, evaluation, and proofs is at
 https://www.cis.upenn.edu/~bcpierce/papers/fj-toplas.pdf.
Typing and proofs yes, to some extent, evaluation not really. This paper does not allow mutation and does not model memory/aliasing. One possibility would be small-step semantics with an explicit heap. (The heap would be infinite and never reuse memory slots between allocations. This way, memory slots can be reliably marked as already deallocated. The stack would not need to be modeled separately, just allocate non-ref parameters and locals when entering the function and deallocate them when the function exits.)
 It would be
 great if those interested in helping could give the paper a close read.

 Once we get this done we'll have a fantastic model for any other
 language changes.


 Andrei
Aug 30 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/30/16 3:42 PM, Timon Gehr wrote:
 On 30.08.2016 18:12, Andrei Alexandrescu wrote:
 I'd like to initiate collaboration on an effort to do DIP1000 rigorously.

 First we need to reduce D to a bare subset that only has integers,
 structs, pointers, and functions. That's a working subset of actual D
 code. The grammar I have in mind is at http://erdani.com/d/DIP1000.html.
 ...
- Missing a production rule for ArgumentList. - There is no ParameterList of length three or more. - 'return' annotations missing.
Cool, thx.
 I'd just actually deallocate the memory slots when their lifetime ends
 and then prove type safety. (This implies in particular that the
 accessed addressed are still allocated for all dereferencing expressions.)

 The model for typing, evaluation, and proofs is at
 https://www.cis.upenn.edu/~bcpierce/papers/fj-toplas.pdf.
Typing and proofs yes, to some extent, evaluation not really. This paper does not allow mutation and does not model memory/aliasing. One possibility would be small-step semantics with an explicit heap. (The heap would be infinite and never reuse memory slots between allocations. This way, memory slots can be reliably marked as already deallocated.
Yah. There are a few papers on modeling heaps, it's not difficult.
 The stack would not need to be modeled separately, just allocate non-ref
 parameters and locals when entering the function and deallocate them
 when the function exits.)
Ah, neat. Didn't think of that. Thanks. I'm embarking on a trip right now, but will get to this as soon as I can. Andrei
Aug 30 2016
prev sibling next sibling parent reply Kagamin <spam here.lot> writes:
On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu 
wrote:
 http://erdani.com/d/DIP1000.html
Return values can't have `scope` annotation? If the container has trusted opAssign, use after free in this case is not accounted for?
Aug 31 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 08/31/2016 08:05 AM, Kagamin wrote:
 On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu wrote:
 http://erdani.com/d/DIP1000.html
Return values can't have `scope` annotation? If the container has trusted opAssign, use after free in this case is not accounted for?
I don't think scope returns are supported, but I'll think of it. Thanks. -- Andrei
Sep 12 2016
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Monday, 12 September 2016 at 14:00:53 UTC, Andrei Alexandrescu 
wrote:
 On 08/31/2016 08:05 AM, Kagamin wrote:
 On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei 
 Alexandrescu wrote:
 http://erdani.com/d/DIP1000.html
That link leads to to a grammar. Isn't DIP1000 the scope and safe DIP ?
Sep 12 2016
prev sibling next sibling parent ZombineDev <petar.p.kirov gmail.com> writes:
On Tuesday, 30 August 2016 at 16:12:19 UTC, Andrei Alexandrescu 
wrote:
 I'd like to initiate collaboration on an effort to do DIP1000 
 rigorously.

 First we need to reduce D to a bare subset that only has 
 integers, structs, pointers, and functions. That's a working 
 subset of actual D code. The grammar I have in mind is at 
 http://erdani.com/d/DIP1000.html.

 There is no type deduction, member functions, classes, arrays, 
 constructors, loops, etc. etc. etc. Only the ability to create 
 arbitrarily complex graphs containing integers and pointers to 
 other nodes.

 On this language we need to define typing rules and evaluation 
 semantics. Once we have those, we need to prove what we want: 
 for scope variables the accessibility never outlives lifetime. 
 As a consequence we're good to deallocate them early etc.

 The model for typing, evaluation, and proofs is at 
 https://www.cis.upenn.edu/~bcpierce/papers/fj-toplas.pdf. It 
 would be great if those interested in helping could give the 
 paper a close read.

 Once we get this done we'll have a fantastic model for any 
 other language changes.


 Andrei
BTW, the Rust Mid-level IR (MIR) has a similar purpose: it lowers the complex AST to a sufficiently simpler one so that one can more easily do things like lifetime analysis / borrow checking on it. Similarly, Swift have a similar step in their compiler pipeline for ARC. I don't know if such intermediate representation in the DMD FE is worth pursuing (obviously would be big development effort), but it may be worth having a look to see how such IR analysis is done in practice (on actual non-toy, non-purely academic languages), even if only for checking that your theoretical model captures sufficient information. Here are some links, which I hope you would find helpful: https://blog.rust-lang.org/2016/04/19/MIR.html (high-level non-technical introduction) https://github.com/rust-lang/rfcs/blob/master/text/1211-mir.md (Rust RFC ~ equivalent of DIP) https://news.ycombinator.com/item?id=10487502 (Presentation slides about Swift's compiler pipeline)
Aug 31 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Some progress with the DIP1000 semantics. So I made a few changes to the 
grammar of the mini-language (call it MiniD1000), which is at 
http://erdani.com/d/DIP1000-grammar.html. Notable aspects:

* Programs are a sequence of function definitions, struct definitions, 
and global variable definitions. A pass is supposed to go through all 
and create the initial typing environment.

* Functions now only have one parameter. This is to reduce the size of 
the language (two parameters increase the numbers of typing and 
evaluation rules). Aliasing can be studied with globals. I'm unclear 
whether we need the second parameter for investigating things like 
cross-parameter aliasing; if so, I'll add it back.

* All functions are required to end with a return statement. This avoids 
any need for flow control.

* There are no rvalues in MinbiD1000, which I thought is quite clever. 
That means `null` and integral literals are not expressions; they are 
explicitly present in assignment statements. This simplifies MiniD1000 a 
great deal, but again will require us to be careful when we "port" 
whatever conclusions we draw back to D.

* The `if` statement has no `else` clause. It only increases the size of 
the language without an increase in expressiveness.

Next, I defined a baseline set of typing rules at 
http://erdani.com/d/DIP1000-typing-baseline.html for "bad" MiniD1000: 
all ref, scope and return attributes are ignored, so there's no special 
protection against bad escapes. Based on these rules we'll later define 
better rules that enforce safety. A few notes:

* For a general understanding on how to read these rules, see the 
Featherweight Java (FJ) paper or see e.g. 
http://www.hedonisticlearning.com/posts/understanding-typing-judgments.html.

* Like in FJ, sequences of zero or more things are shown with an overbar.

* In order to identify the current function being compiled, I introduce 
the symbol f_{crt}. For example, in rule S-VarDeclaration, a 
prerequisite of the judgment is that f_{crt} exists, i.e. the rule 
S-VarDeclaration only applies inside a function, not outside. For the 
outside rule see D-GlobalVarDeclaration.

* S-VarDeclaration introduces a new name visible to the statements that 
follow. This is done by typing not only the variable declaration, but 
also all following statements in its scope. The typing environment for 
the statements is the union of the preexisting typing environment and 
the new variable.

* In fact the typing rules of all statements carry the appendix "and 
here are the following statements in the scope" which is workable but 
possibly a bit goofy. Any cleverer idea?

* For that matter I need the rule S-NoStmts: "a sequence of zero 
statements types just fine".

* In D-Struct, I vaguely state that a struct shall not be a transitive 
field of itself. Is there a better way to enforce that succintly?

Please let me know of anything I missed or got wrong. Once the baseline 
typing is in place, the hard part starts: defining the enhanced typing 
that enforces safe scoping rules.


Thanks,

Andrei
Sep 14 2016
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 14.09.2016 21:49, Andrei Alexandrescu wrote:
 Some progress with the DIP1000 semantics. So I made a few changes to the
 grammar of the mini-language (call it MiniD1000), which is at
 http://erdani.com/d/DIP1000-grammar.html. Notable aspects:

 * Programs are a sequence of function definitions, struct definitions,
 and global variable definitions. A pass is supposed to go through all
 and create the initial typing environment.
 ...
There probably should be typing rules for that.
 * Functions now only have one parameter. This is to reduce the size of
 the language (two parameters increase the numbers of typing and
 evaluation rules). Aliasing can be studied with globals. I'm unclear
 whether we need the second parameter for investigating things like
 cross-parameter aliasing; if so, I'll add it back.
 ...
I would support an arbitrary number of parameters. Leaving that out does not buy much in terms of simplicity, and it makes miniD quite a lot less relevant for arguing about the soundness of the full language.
 * All functions are required to end with a return statement. This avoids
 any need for flow control.
 ...
How so? You still allow return statements at arbitrary places in the function.
 * There are no rvalues in MinbiD1000,
Yes there are. Function calls might be rvalues because 'ref' is optional on them.
 which I thought is quite clever.
 That means `null` and integral literals are not expressions; they are
 explicitly present in assignment statements. This simplifies MiniD1000 a
 great deal, but again will require us to be careful when we "port"
 whatever conclusions we draw back to D.
 ...
That should be fine. I think a better way is to special-case only the lhs though. I.e. x=e, *e₁=e₂, f(e₁,…,eₙ)=eₙ₊₁.
 * The `if` statement has no `else` clause. It only increases the size of
 the language without an increase in expressiveness.

 Next, I defined a baseline set of typing rules at
 http://erdani.com/d/DIP1000-typing-baseline.html for "bad" MiniD1000:
 all ref, scope and return attributes are ignored, so there's no special
 protection against bad escapes. Based on these rules we'll later define
 better rules that enforce safety. A few notes:
Looking good, except: - struct types are not checked for existence. E.g. the following program type checks: void main(){ Undefined x; Undefined* y=&x; } - 'fields' is not defined. (And how is it supposed to know what the fields of some syntactic type are?) Probably struct declarations should be part of the environment, then 'fields' can take the environment as an additional argument. I think it would make sense if your grammar enforced that all struct declarations come before all global variables which come before all function declarations. - Functions cannot call each other, because they cannot look up each other's names.
 * For a general understanding on how to read these rules, see the
 Featherweight Java (FJ) paper or see e.g.
 http://www.hedonisticlearning.com/posts/understanding-typing-judgments.html.


 * Like in FJ, sequences of zero or more things are shown with an overbar.

 * In order to identify the current function being compiled, I introduce
 the symbol f_{crt}. For example, in rule S-VarDeclaration, a
 prerequisite of the judgment is that f_{crt} exists, i.e. the rule
 S-VarDeclaration only applies inside a function, not outside. For the
 outside rule see D-GlobalVarDeclaration.

 * S-VarDeclaration introduces a new name visible to the statements that
 follow. This is done by typing not only the variable declaration, but
 also all following statements in its scope. The typing environment for
 the statements is the union of the preexisting typing environment and
 the new variable.

 * In fact the typing rules of all statements carry the appendix "and
 here are the following statements in the scope" which is workable but
 possibly a bit goofy. Any cleverer idea?
 ...
I think it is fine.
 * For that matter I need the rule S-NoStmts: "a sequence of zero
 statements types just fine".

 * In D-Struct, I vaguely state that a struct shall not be a transitive
 field of itself.
Well, the notation does not really make a lot of sense.
 Is there a better way to enforce that succintly?
 ...
Typing rules for structs need to state that all struct declarations together are fine if each struct is fine individually given all other structs present in the environment. Then define something like transitiveFieldTypes(Γ,S) := μX. {T | ∃x. T x; ∈ (fields(Γ,S) ∪ ⋃{ fields(Γ,T') | T'∈X }) } (In English: transitiveFieldTypes(Γ,S) is the smallest set such that all field types of S and field types of other types in the set are also in the set.) You can then require that S∉transitiveFieldTypes(Γ,S)
 Please let me know of anything I missed or got wrong. Once the baseline
 typing is in place, the hard part starts: defining the enhanced typing
 that enforces safe scoping rules.
 ...
Maybe we should define the semantics first (it is easier and it is necessary for providing formal counter-examples for the soundness of proposed typing rules).
Sep 14 2016
prev sibling parent reply Kagamin <spam here.lot> writes:
Well, aliasing can be reproduced with locals

S s;
int* r = getPayload(s);
freePayload(s);
int v = *r; //UAF

Multiparameter functions can be declared to be equivalent to

struct P { S* s; int* r; }
P p;
p.s = &s;
p.r = getPayload(s);
f(p); //as if f(S*,int*)
Sep 15 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 09/15/2016 04:44 AM, Kagamin wrote:
 Well, aliasing can be reproduced with locals

 S s;
 int* r = getPayload(s);
 freePayload(s);
 int v = *r; //UAF
Nit: in MiniD1000 you'd need to declare vars first, assign them second: S s; int* r; r = getPayload(s); freePayload(s); int v; v = *r; (and there are no comments :o))
 Multiparameter functions can be declared to be equivalent to

 struct P { S* s; int* r; }
 P p;
 p.s = &s;
 p.r = getPayload(s);
 f(p); //as if f(S*,int*)
Thanks for making this point. Andrei
Sep 15 2016
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 15.09.2016 14:57, Andrei Alexandrescu wrote:
 Multiparameter functions can be declared to be equivalent to

 struct P { S* s; int* r; }
 P p;
 p.s = &s;
 p.r = getPayload(s);
 f(p); //as if f(S*,int*)
Thanks for making this point.
This fails when considering scope and return annotations though. Parameters can be annotated or not annotated with scope/return scope independently, but the struct can either be annotated or not annotated, the fields cannot be treated independently (which is one of the weaknesses of DIP1000).
Sep 15 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/15/16 5:14 PM, Timon Gehr wrote:
 struct can either be annotated or not annotated, the fields cannot be
 treated independently (which is one of the weaknesses of DIP1000).
I was planning to allow scope annotations for struct fields. -- Andrei
Sep 15 2016
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 16.09.2016 01:07, Andrei Alexandrescu wrote:
 On 9/15/16 5:14 PM, Timon Gehr wrote:
 struct can either be annotated or not annotated, the fields cannot be
 treated independently (which is one of the weaknesses of DIP1000).
I was planning to allow scope annotations for struct fields. -- Andrei
I see. In that case, single-parameter functions only is probably okay.
Sep 15 2016