digitalmars.D - Memory/Allocation attributes for variables

Elmar (283/283) May 29 2021 Hello dear D community,

Ola Fosheim Grostad (14/21) May 29 2021 I agree that D has jumped down the rabbithole in terms of

Elmar (74/96) May 31 2021 Thank you for your reply. Also sorry for the wordyness, I'm just

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (12/23) May 31 2021 All high level programming-languages do. Only the low level

Elmar (110/133) May 31 2021 I suppose you mean the "higher" level languages (because C is by

Ola Fosheim Grostad (34/79) May 31 2021 Yes, I mean system level language vs proper high level languages

Ola Fosheim Grostad (4/6) May 31 2021 What I meant here is that depth would be too restrictive as it
Elmar (100/138) Jun 03 2021 Thank you for answering.

sighoya (47/72) Jun 03 2021 What if dup is creating things on the heap (I don't know by the

Elmar (173/216) Jun 05 2021 There is no combinatorial explosion, that would be a bad idea ;-).

sighoya (27/32) Jun 06 2021 That seems to very much resemble the idea odin strives for:

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (33/61) Jun 04 2021 I don't think separate compilation is a good point. I think a

Elmar (38/54) Jun 05 2021 I know what you mean. Avoiding separate compilation where

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/14) Jun 05 2021 But introducing all these special cases just to avoid explicit

Elmar <chrehme gmx.de> writes:

Hello dear D community,

personally I'm interested in security, memory safety and such 
although I'm not an expert.
I would like to know, if D has memory/allocation attributes (I 
will use both terms interchangeably), or if someone knows a 
library for that which statically asserts that only values 
allocated in compatible allocation regions are contained in 
variables of attributed type. Attributes for memory safety can be 
seen as extension to the type system because these memory 
attributes constrain the address domain of values instead of the 
value domain itself and so they become part of the value domain 
of a pointer or reference.

Now there are A LOT of different allocation regions and scopes 
for variables for whatever purpose:

  - static allocation
  - stack frame
  - dynamic allocation w/o GC
  - dynamic allocation with GC
  - fast allocators optimized for specific objects or even 
function-specific allocators
  - peripheral addresses
  - yes even register as allocation region (which allows a value 
to not be stored in RAM and thus being not easily overwritten, 
which is useful for security reasons like storing pointer 
encryption keys, stack canaries or to assign common registers to 
variables)
  - or memory-only allocation (which requires a value to not be 
stored/hold in registers)

Memory safety problems often boil down to that the program 
accidentally stores a pointer value into a variable being 
*semantically* out of bounds or pointing to a too small memory 
area w.r.t. the variable's purpose or scope. (Aliasing of 
variables in languages which better type-safety this probably 
impossible, typical is the case for unbounded data structures 
like C's variadic arguments or attacker-controlled 
variable-length arrays).

I don't know any other language yet which has allocation 
attributes for pointer/reference variables and allocation 
attributes for value-typed variables which restricts the 
allocation region for the data of the variable (some kind of 
contract on variable level). However, value-type variables are 
another story because they are allocated at the same time when 
defined and would only serve as expressive generalization of such 
attributes, generalized to value-types and even control 
structures.

Looking at what attributes D provides I definitly see that memory 
safety related concerns are addressed with existing attributes. 
But I personally find them rather unintuitive to use or difficult 
to learn for beginners and not flexible enough (like two 
different versions of `return ref scope`). As currently defined, 
those attributes don't annotate destinations of data flow but 
sources (like function arguments) of data flow.

What I imagine: More specific scopes (more specifically 
attributed types) or allocation regions correspond to more 
generalized types. Some scopes/allocation regions are contained 
within others (these smaller contained regions are virtually base 
types of bigger regions) and some regions are disjoint (but they 
should not intersect each other incompletely which is against the 
structured-programming paradigm and inheritance analogy in OOP). 
This results in Scope Polymorphy which statically/dynamically 
checks the address type of RHS expressions during assignments and 
memory safety becomes a special case of type safety.

I could annotate return values with attributes to make clear that 
a function returns GC-allocated memory, e.g. using a  gc 
attribute.

```d
 gc string[] stringifyArray(T : U[], U)(T arr) {
     import std.algorithm.iteration : map;
     import std.conv : to;
     return arr.map!(value => value.to!string);
}
 nogc stringtable = stringifyArray([1, 2, 3]);    // error!

// a useless factory example
 new auto makeBoat(double length, double height, Color c) {
     theAllocator = Mallocator;

     auto b = new Boat(length, height, c);

     theAllocator = processAllocator;
     return b;
}

// combining multiple attributes gives a union of both which is 
valid for reference variables
 new  newcpp  gc Boat = makeBoat( ... );
// technically, a union of attributes for value types is possible 
but would
// require inferring the most appropriate attribute from context 
which is difficult
```

Variables with no attributes allow any pointer for assignment and 
will infer the proper attribute from the assignment.

Some of these use cases are already covered by existing 
attributes:

  - `scope` makes sure, that a reference variable is not written 
to an allocation region outside the current function block (which 
corresponds to using "` scope(function)`" with the argument, see 
below) and it would be type-unsafe to assign it to a variable 
type with larger scope. "Scope" basically means, the argument 
belongs to a stack frame in the caller chain. (It corresponds to 
arguments annotated with "` caller`", see below.) It's used to 
tell the function that the referenced value has a limited 
lifetime in a caller stack frame despite being a reference 
variable and the reference could become invalid after the 
function returns so it must not write the value to variables 
outside the function. For arguments this is very useful and I 
would rather prefer the complementary case to be explicit. That's 
where `in` is really useful as a short form.
  - `ref` specifies that the actual allocation region of a 
variable's value is outside of the function scope in which the 
variable is visible (or used). (`out` is similar.)
  - `return ref` specifies that the value (referenced by the 
returned reference) is in the same allocation scope as the 
argument annotated with `return ref` (corresponds to the 
annotation of the return type with "` scope(argName)`", see 
below).
  - `return ref scope`, a combination of two above. The return 
type is seen to have the same allocation region equal to the one 
used by this annotated argument.
  - `__gshared`, `shared`. Variables with these attributes save 
them in a scope accessible across threads. This is the default in 
C so that `__gshared` corresponds to C's volatile values which 
are accepted by ` memory` references.

Here is a (really long) collection of many possible memory 
attributes I am looking for. They define which addresses of 
values are accepted for the pointer/reference:

  -  auto: allocation in any stack frame, which includes 
fixed-size value-type variables passed as arguments or return 
value
  -  stack: dynamic allocation in any stack frame (alloca)
  -  loop: allocation in the region which lives as long as the 
current loop lives
  -  scope(recursion): allocation-scope not yet available in D I 
believe, scope which lives as long as the entire recursion lives, 
equivalent to `loop` in the functional sense. Locals in this 
scope are accessible to all recursive calls of the same function.
  -  scope(function): allocation in the current stack frame 
(`scope`d arguments are a special case of this)
  -  scope(label): allocation in the scope of the labeled control 
structure
  -  scope(identifier): allocation in the same scope as the 
specified variable name, `return ref` can be seen as special case 
for return types.
  -  static: allocation/scope in static memory segment (lifetime 
over entire program runtime), `static` variables and control 
structures are a special case of this attribute
  -  caller: allocation in the caller's stack frame (usuable for 
convenient optimizations like shown below), an "implicit 
argument" when used for value types, corresponds to `ref scope` 
for reference-type variables. Something in between "static" and 
"auto".
  -  gc: allocation region managed by D's garbage collector
  -  nogc: disallows pointer/reference to GC-allocated data
  -  new: allocation region managed by Mallocator
  -  newcpp: allocation region managed by stdcpp allocator thing, 
eases C++ compatibilty
  -  peripheral: target- or even linkerscript-specific memory 
region for peripherals
  -  register: only stored in a register (with compile-time error 
if not possible)
  -  shared: allocation region for values which are synchronized 
between threads
  -  memory: never stored in a register (use case can overlap with 
"` peripheral`", it's used for variables whose content can change 
non-deterministically and must be reloaded from memory each time, 
for example interrupt-handler modified variables, it also 
prevents optimization when unwanted)
  -  make(allocator): allocated by the given allocator (dynamic 
type check required, if "allocator" is a dynamic object)

In the basic version for reference variables these attributes 
statically/dynamically assert that a given pointer value is in 
bounds of that allocation region. Of course, this is a long list 
of personal ideas and some of them could be unpopular in the 
community. But I think, all of them would be a tribute to Systems 
programming.

Why are such attributes useful? At first because type-safe design 
means to restrict value domains as much as possible so that it is 
only as large as required. They restrict the address (pointer 
value) at which a value bounded by a variable can be located and 
provide additional static type checks as well as *allocation 
transparency* (something which I miss in every language I used so 
far). The good thing is, if no attribute is provided, it can be 
inferred from the location where the value-typed variable is 
defined or is inferred from the assigned pointer value for 
reference types.
Maybe also useful: with additional memory safety attributes, it 
could become legitimate to assign to `scope`d reference variables.

For reference-type variables, these attributes are simple value 
domain checks of the pointer variable. A disadvantage of memory 
attributes is (like with polymorphy) that runtime checks might be 
needed in some cases when static analysis isn't sufficient (if 
attributes are casted).

An interesting extension is a generalization to value-type 
variables. It can generalize the `scope` and `return` attribute 
to value-types. While probably not un-controversal it could allow 
fine control over variable allocation and force where a 
value-typed variable is allocated exactly (allocation 
guarantees). You could indirectly define a variable in a nested 
code block which is allocated for another scope. The main 
disadvantage I can think of is only, that it cannot be just 
created as a library add on.

```d
outer: {
     // ...
      scope(inner) uint key = generateKey(seed);  // precomputes 
the RHS
     // and initializes the LHS with the memorized value when 
entering the "inner" block
     seed.destroy();    // do something with seed, modify/destroy 
it, whatever
     // key is not available/accessible here
     // Message cipher;   // <-- implicit but uninitialized
     inner: if (useSecurity) {
         // if not entered, the init-value of the variable is used
          scope(outer) Message cipher = encrypt(key);
         // Implicitly defines "cipher" uninitialized in "outer" 
scope.
         // Generates default init in all other control flow paths 
without  scope(outer) definition
     }
     //else cipher = Messsage.init;   // <-- implicit, actual 
initialization
     decrypt(cipher, key);    // error, key is only available in 
the "inner" scope
}
```

Some would criticize the unconventional visibility of `cipher` 
which doesn't follow common visibility rules. For example if 
`static` variables are defined in functions, they are still only 
visible in the function itself and not in the entire scope in 
which they live. So a likely improvement would be that the 
visibility is not impacted by the attribute, only the point of 
actual creation/destruction. Just looking at the previous 
example, it would seem useless at first, but it's not if loops 
are considered (and variables which have ` loop` scope, that 
means are created on loop entry and only destructed on loop exit).

Also interesting cases can emerge for additional user 
optimization in order to avoid costly recomputation by using a 
larger scope as allocation region:

```d
double permeability(Vec2f direction) {
      caller Vec2f grad = calculateTextureDerivative();
     // "grad" is a common setup shared by all calls to 
"permeability" from the same caller instance
     // It is hidden from the caller because it's an 
implementation detail of this function.
     // All calls of "grad" by the same caller will use the same 
variable.
     // It would be implemented as invisible argument whose 
initialization
     // happens in the caller. The variable is stored on the 
caller's site as
     // invisible variable and is passed with every call.
     return scalprod(direction, grad);
}
```

A main benefit of this feature is readability and in some cases 
optimization because the executed function is not repeated for 
every call, only if the repetition is needed which can be 
computed in the callee instead.
For closures the ` caller` scope is clear but it also works for 
non-closure functions as an invisible argument. Modifications to 
a ` caller ref` variable are remembered for consecutive calls 
from the same caller stack frame whereas ` caller` without ref 
maybe only modifies a local copy.

Or being able to create Arrays easily on the stack which is yet a 
further extension

```d
 auto arr1 = [0, 1, 2, 3];  // asserts fixed-size, okay, but 
variable size would fail
 stack arr2 = dynamicArr.dup;   // create a copy on stack, the 
stack is "scope"d
```

An easy but probably limited implementation would set 
`theAllocator` before the initialization of such an attributed 
value-type variable and resets `theAllocator` afterwards to the 
allocator from before.


Finally, one could even more generally annotate control 
structures with attributes to define in whose scope's entry the 
control structure's arguments are evaluated (e.g. `static if` is 
a special case which represents ` static if` in terms of 
attributes) but this yet another different story and unrelated to 
allocation.


This is it, I'm sorry for the long post. It took me a while to 
write it down and reread.
Regards!

May 29 2021

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Sunday, 30 May 2021 at 02:18:38 UTC, Elmar wrote:
 Looking at what attributes D provides I definitly see that 
 memory safety related concerns are addressed with existing 
 attributes. But I personally find them rather unintuitive to 
 use or difficult to learn for beginners and not flexible enough 
 (like two different versions of `return ref scope`). As 
 currently defined, those attributes don't annotate destinations 
 of data flow but sources (like function arguments) of data flow.

I agree that D has jumped down the rabbithole in terms of 
usability and function signatures are becoming weirder. The 
reusage of the term "return" is particularly bad.

To a large extent this is the aftermath that comes from changing 
course when it went from D1 to D2. Where simplicity was 
sacrificed and it was opened for more and more complexity. Once a 
language becomes complex, it seems difficult to prevent people 
from adding just-one-more-feature that adds to the complexity. 
Also, since experienced users influence the process most... There 
are nobody to stop it.

The main issue is however not specifying where it allocates, but 
keeping track of it when pointers are put into complex 
datastructures.

May 29 2021

Elmar <chrehme gmx.de> writes:

On Sunday, 30 May 2021 at 05:13:45 UTC, Ola Fosheim Grostad wrote:
 On Sunday, 30 May 2021 at 02:18:38 UTC, Elmar wrote:
 Looking at what attributes D provides I definitly see that 
 memory safety related concerns are addressed with existing 
 attributes. But I personally find them rather unintuitive to 
 use or difficult to learn for beginners and not flexible 
 enough (like two different versions of `return ref scope`). As 
 currently defined, those attributes don't annotate 
 destinations of data flow but sources (like function 
 arguments) of data flow.

 I agree that D has jumped down the rabbithole in terms of 
 usability and function signatures are becoming weirder. The 
 reusage of the term "return" is particularly bad.

 To a large extent this is the aftermath that comes from 
 changing course when it went from D1 to D2. Where simplicity 
 was sacrificed and it was opened for more and more complexity. 
 Once a language becomes complex, it seems difficult to prevent 
 people from adding just-one-more-feature that adds to the 
 complexity. Also, since experienced users influence the process 
 most... There are nobody to stop it.

 The main issue is however not specifying where it allocates, 
 but keeping track of it when pointers are put into complex 
 datastructures.

Thank you for your reply. Also sorry for the wordyness, I'm just 
awkwardly detailed sometimes.

In your case it's not what I was thinking. I would count myself 
to the sophisticated programmers (but not the productive ones, 
unfortunately). I can cope with all those reused keywords even 
though I think at this place their design is unintuitive to use. 
Intuitive would be annotation of the return type because the 
aliasing is a property of the return type, not the argument. At 
least I feel like I understood the sparsely explained intension 
behind the current scope-related attributes but my main point is, 
I find they can be improved with more expressiveness. It would 
give programmers a hint of what kind of allocated argument is 
acceptable for a parameter. And no, this is not trivial. It's the 
reason for my decision to start this thread:

*Functions in phobos accept range iterators of fixed-sized arrays 
as range argument but even if it fails miserably, it compiles 
happily and accesses illegal memory without any warning, creating 
fully non-deterministic results with different compilers. I 
noticed this when I tried to use "map" with fixed-size arrays. It 
simply misses any tool to check and signal that fixed-size arrays 
are illegal as range argument for "map". And sometimes mapping 
onto fixed-size arrays even works.*

Without better memory safety tools, I'd discourage more memory 
efficient programming techniques in D although I'd really like to 
see D for embedded and resource constrained systems to replace C.

---

I wonder how programming languages don't see the obvious, to 
consider memory safety as a part of type safety 
(address/allocation properties to be type properties) and that 
memory unsafe code only means an incomplete type system. I also 
don't know whether conventional "type safety" in programming 
languages suffices to eliminate the possibility of deadly 
programming bugs (aliased reference variables e.g.). But of 
course, security and safety is complex and there is no way around 
complexity to make safe code flexible.


The important part is the first one (without generalization to 
allocation and control structures which I only mentioned as an 
interesting thought) because I think it's an easy but effective 
addition. D already has features in that direction which is good, 
the awareness exists, but it's still weak at some points. My post 
should be seen as a collection of ideas and a request for comment 
(because maybe my ideas are totally bad or don't fit D) rather 
than a request to implement all this. The main point is to 
consider references/pointers as values with critical type safety 
which means a way to specify stricter constraints. Memory safety 
is violated by storing a pointer value in a reference which is 
out of the intended/reasonable value domain of the pointer (not 
matching its lifetime).

If someone already thought the same like me, there could be a 
safepointer-like user library which supports additional 
attributes (which represent restricted pointer domains) by 
implementing a custom pointer/reference type. (It's not a smart 
pointer because smart pointers try to fix the problem at the 
other end and require dynamic allocation which is not that nice.) 
Due to D's nature, it would support safe pointers and safe 
references (reference variables) and provides static and dynamic 
type checks with overloaded operators and memory attributes. 
Attributes couldn't be inferred automatically I guess but  
annotation of variables could entirely allow static memory safety 
checks (which doesn't need to explicitly test whether a pointer 
value is contained in a set of allowed values) and maybe prevents 
bugs or unwanted side effects.

---

One important aspect which I forgot: aliasing of variables. I 
know, D allows aliased references as arguments by default. Many 
memory safety problems derive from aliased variables which were 
not assumed to be aliased. Aliased variables complicate formal 
verification of code and confuse people. I would add 
` alias(symbol)` to my collection which indicates that a 
reference explicitly aliases (overlap) another reference in 
memory or a ` noalias(symbol)`.

---

If someone thinks, I heavily missed something, please let me know.

May 31 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Monday, 31 May 2021 at 18:21:26 UTC, Elmar wrote:
 I wonder how programming languages don't see the obvious, to 
 consider memory safety as a part of type safety 
 (address/allocation properties to be type properties) and that 
 memory unsafe code only means an incomplete type system.

All high level programming-languages do. Only the low level 
don't, and that is one of the things what makes their type 
systems unsound.

 constraints. Memory safety is violated by storing a pointer 
 value in a reference which is out of the intended/reasonable 
 value domain of the pointer (not matching its lifetime).

But how do you keep track of it without requiring that all graphs 
are acyclic? No back pointers is too constraining.

And no, Rust does not solve this. Reference counting does not 
solve this. How do you prove that a graph remains fully connected 
when you change one pointer?

 One important aspect which I forgot: aliasing of variables. I 
 know, D allows aliased references as arguments by default. Many 
 memory safety problems derive from aliased variables which were 
 not assumed to be aliased.

So, how do you know that you don't have aliasing when you provide 
pointers to two graphs? How do you prove that none of the nodes 
in the graph are shared?

May 31 2021

Elmar <chrehme gmx.de> writes:

Good questions :-) .

On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad wrote:
 On Monday, 31 May 2021 at 18:21:26 UTC, Elmar wrote:
 I wonder how programming languages don't see the obvious, to 
 consider memory safety as a part of type safety 
 (address/allocation properties to be type properties) and that 
 memory unsafe code only means an incomplete type system.

 All high level programming-languages do. Only the low level 
 don't, and that is one of the things what makes their type 
 systems unsound.

I suppose you mean the "higher" level languages (because C is by 
original definition also a high-level language). I neither know 
any "higher" level language which provides the flexibility of 
constraining the value domain of a pointer/reference except for 
restricting `null` (non-nullable pointers are probably the most 
simple domain constraint for pointers/references). I think, not 
even Ada nor VHDL have it.

The thing I'd like to gain with those attributes is a guarantee, 
that the referenced value wasn't allocated in a certain address 
region/scope and lives in a lifetime-compatible scope which can 
be detected by checking the pointer value against an interval or 
a range of intervals. For example a returned reference to an 
integer could have been created with "malloc" or even a C++ 
allocator or interfacing functions could annotate parameters with 
such attributes.

With guarantees about the scope of arguments function 
implementations can avoid buggy reference assignments to outside 
variables. The function could expect compatible references 
allocated with GC but the caller doesn't know it. Whether any 
reference variable assignment is legitimate can be checked by 
comparing the source attributes (the reference value which says 
where the value is allocated) with the destination attributes 
(where the reference is stored in memory). Even better are 
runtime checks of pointer values for a better degree of memory 
safety but only if the programmers want to use it. A reference 
assignment is legitimate if the destination scope is compatible 
with the source's scope, not in any other case. I would suggest a 
lifetime rating for value addresses as follows:

*peripheral > system/kernal > global shared > private global 
(TLS) > extern global (TLS) > shared GC allocated > shared 
dynamically allocated > GC allocated (TLS) > dynamically 
allocated (TLS) <=> RAII/scoped/stack <=> RAII/scoped/stack > 
register*

Heap regions are not always comparable to stack or RAII. So the 
current practice of not allowing assignment to RAII references 
(using `scope` attribute) is probably best to continue. 
Everything other than stack addresses are seen as one single 
lifetime region with equal lifetime. The comparison between stack 
addresses assumes that an address deeper in the stack has a 
higher or equal lifetime. The caller could also provide it's 
stack frame bounds which allows to consider this interval as one 
single lifetime.

It should constrain the possible value domain of pointers 
absolutely so that no attack with counterfeited pointers to 
certain memory addresses is possible. If I would use custom 
allocators for different types I could expect or delimit what the 
pointer value can be.

On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad wrote:
 constraints. Memory safety is violated by storing a pointer 
 value in a reference which is out of the intended/reasonable 
 value domain of the pointer (not matching its lifetime).

 But how do you keep track of it without requiring that all 
 graphs are acyclic? No back pointers is too constraining.

 And no, Rust does not solve this. Reference counting does not 
 solve this. How do you prove that a graph remains fully 
 connected when you change one pointer?

I think, this is GC-related memory management, not type checking. 
The memory attributes don't solve memory management problems. The 
problem with reference counting usually is solved by inserting 
weak pointers into cycles (which also solves the apparent 
contradiction of a cycle of references). Weak references are used 
by those objects which are deeper in the graph of data links. 
Otherwise it's a code smell and one could refactor the links into 
a joint object and deleted objects will deregister in this joint 
object. I already thought about other allocation schemes for 
detecting cycles that could be combined with reference counting. 
For example tagging structs/classes with the ID of the conntected 
graph in which they are linked if they aren't leaves. But this ID 
is difficult to change. It can also analyze at compile time which 
pointers can only be part of a cycle but more explanation leads 
to far here.

Instead the problem, my idea is intended to solve, is

  1. giving hints to programmers (to know which kind of allocated 
memory works with the implementation, stack addresses apparently 
won't generally work with `map` for example)
  2. having static or dynamic (simple) value domain checks (which 
checks whether a pointer value is in the allowed interval(s) of 
the allocation address spaces belonging to the attributes) which 
ensures that only allowed types of allocation are used. These 
checks can be used to statically or dynamically dispatch 
functions. Of course such a check could also be performed 
manually but it's tedious and requires me to put all different 
function bodies in one `static if else`.

It's more of a lightweight solution and works like an ordinary 
type check (value-in-range check).

Where the feature shines most is function signatures because they 
separate code and create intransparency which can be countered by 
memory attributes for return type and argument types.

On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad wrote:
 One important aspect which I forgot: aliasing of variables. I 
 know, D allows aliased references as arguments by default. 
 Many memory safety problems derive from aliased variables 
 which were not assumed to be aliased.

 So, how do you know that you don't have aliasing when you 
 provide pointers to two graphs? How do you prove that none of 
 the nodes in the graph are shared?


Okay, I didn't define aliasing. With "aliasing" I mean that 
"aliasing references" (or pointers) either point to the exact 
same address or that the immediately pointed class/struct 
(pointed to by the reference/pointer) does not overlap. I would 
consider anything else more complicated than necessary. The 
definition doesn't care about further indirections. I often only 
consider the directly pointed struct or class contiguous chunk of 
memory as "the type". If I code a function, I'm usually only 
interested in the top level of the type (the "root node" of the 
type) and further indirections are handled by nested function 
calls. For example it suffices, if two argument slices are not 
overlapping. For that I only need to check aliasing as just 
defined. If you really would like two arguments (graphs) to not 
share any single pointer value I would suggest using a more 
appropriate type than a memory attribute, a type which is 
recursively "unique" (in terms of only using "unique pointers").

Do you think, it sounds like a nice idea to have a data structure 
attribute `unique` next to `abstract` and `final` which 
recursively guarantees that any reference or pointer is a unique 
pointer?

If you are interested for a algorithmic answer to your questions, 
then the best approach (I quickly can think of) is creating an 
appropriate hash table from all pointers in one graph and testing 
all pointers in the other graph against it (if I cannot use any 
properties on the pointers' values, e.g. that certain types and 
all indirections are allocated in specific pools). But that only 
works with exactly equal pointer values.

May 31 2021

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 1 June 2021 at 00:36:17 UTC, Elmar wrote:
 On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad 
 wrote:
 I suppose you mean the "higher" level languages (because C is 
 by original definition also a high-level language).

Yes, I mean system level language vs proper high level languages 
that abstract away the hardware. "Low level" is not correct, but 
the usage of the term "system level" tend to lead to debates in 
these fora as there is a rift between low level and high level 
programmers...

 The thing I'd like to gain with those attributes is a 
 guarantee, that the referenced value wasn't allocated in a 
 certain address region/scope and lives in a lifetime-compatible 
 scope which can be detected by checking the pointer value 
 against an interval or a range of intervals. For example a 
 returned reference to an integer could have been created with 
 "malloc" or even a C++ allocator or interfacing functions could 
 annotate parameters with such attributes.

Well, I guess you are new, but Walter will refuse having many 
pointer types. Even the simple distinction between gc and raw 
pointers will be refused. The reason being that it would lead to 
a combinatorial explosion of function instances and prevent 
separate compilation.

So for D, this is not a probable solution.

That means you cannot do it through the regular type system, so 
that means you will have to do shape analysis of datastuctures.

I personally have in the past argued that it would be an 
interesting experiment to make all functions templates and 
template pointer parameter types.

That you can do with library pointer types, as a proof of 
concept, yourself. Then you will see what the effect is.

 lifetime region with equal lifetime. The comparison between 
 stack addresses assumes that an address deeper in the stack has 
 a higher or equal lifetime. The caller could also provide it's 
 stack frame bounds which allows to consider this interval as 
 one single lifetime.

How about coroutines? Now you have multiple stacks.

 I think, this is GC-related memory management, not type 
 checking. The memory attributes don't solve memory management 
 problems. The problem with reference counting usually is solved 
 by inserting weak pointers into cycles (which also solves the 
 apparent contradiction of a cycle of references). Weak 
 references are used by those objects which are deeper in the 
 graph of data links.

No, depth does not work, you could define an acyclic graph of 
owning pointers and then use weak pointers elsewhere. This 
restricts modelling and algorithms. So compiler verified 
restriction of non-weak references might be too restrictive?

So basically whenever changing a non-weak reference the compiler 
has to prove that the graph still is non-weak acyclic. Maybe 
possible, but does not sound trivial.

  2. having static or dynamic (simple) value domain checks 
 (which checks whether a pointer value is in the allowed 
 interval(s) of the allocation address spaces belonging to the 
 attributes) which ensures that only allowed types of allocation 
 are used. These checks can be used to statically or dynamically 
 dispatch functions. Of course such a check could also be 
 performed manually but it's tedious and requires me to put all 
 different function bodies in one `static if else`.

Dynamic checks are unlikely to be accepted, I suggest you do this 
as a library.

 Where the feature shines most is function signatures because 
 they separate code and create intransparency which can be 
 countered by memory attributes for return type and argument 
 types.

Unfortunately, this is also why it will be rejected.

 Okay, I didn't define aliasing. With "aliasing" I mean that 
 "aliasing references" (or pointers) either point to the exact 
 same address or that the immediately pointed class/struct 
 (pointed to by the reference/pointer) does not overlap. I would 
 consider anything else more complicated than necessary.

Insufficient for D with library container types and library smart 
pointers.

 Do you think, it sounds like a nice idea to have a data 
 structure attribute `unique` next to `abstract` and `final` 
 which recursively guarantees that any reference or pointer is a 
 unique pointer?

Yes, some want isolated pointers, but you have to do all this 
stuff as library smart pointers in D.

May 31 2021

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 1 June 2021 at 06:12:05 UTC, Ola Fosheim Grostad 
wrote:
 No, depth does not work, you could define an acyclic graph of 
 owning pointers and then use weak pointers elsewhere.

What I meant here is that depth would be too restrictive as it 
would prevent reasonable insertions nodes.

May 31 2021

Elmar <chrehme gmx.de> writes:

Thank you for answering.

On Tuesday, 1 June 2021 at 06:12:05 UTC, Ola Fosheim Grostad 
wrote:
...
 The thing I'd like to gain with those attributes is a 
 guarantee, that the referenced value wasn't allocated in a 
 certain address region/scope and lives in a 
 lifetime-compatible scope which can be detected by checking 
 the pointer value against an interval or a range of intervals. 
 For example a returned reference to an integer could have been 
 created with "malloc" or even a C++ allocator or interfacing 
 functions could annotate parameters with such attributes.

 Well, I guess you are new, but Walter will refuse having many 
 pointer types. Even the simple distinction between gc and raw 
 pointers will be refused. The reason being that it would lead 
 to a combinatorial explosion of function instances and prevent 
 separate compilation.

The separate compilation is a good point. Binary compatibility is 
a common property considered for security safeguards. But at 
least static checking with attributes would need no memory 
addresses at all (also if the compiler can infer the attribute 
for every value-typed variable automatically from where it is 
defined). Dynamic checks of pointers accross binary interfaces 
are difficult. It would work flawlessly with library-internal 
memory regions but for outside pointer values it can only rely on 
runtime information (memory regions used by allocators) or cannot 
perform checks at all (because it doesn't know the address ranges 
to check against). Or it would work better if binaries would 
support relocations for application-related memory addresses 
which are filled at link time. Static checks strike the balance 
here.

 I personally have in the past argued that it would be an 
 interesting experiment to make all functions templates and 
 template pointer parameter types.

 That you can do with library pointer types, as a proof of 
 concept, yourself. Then you will see what the effect is.

Okay, that's fine. Pointers in D are not debatable, I would not 
try. I think, any new language should remove the concept of 
pointers entirely rather than introducing new pointers. Pointers 
from C should be treated as reference variables, pointers to C as 
either an unbounded slice (if bounded, there should be another 
`size_t` argument to the function) or it passes addresses 
obtained from variables. As a C programmer I'd say that C's 
pointer concept was never needed as it stands, it just was 
created to be an unsafe reference variable + a reference + an 
iterator all-in-one-solution as the simplest generic thing which 
beats it all (without knowing the use case by looking at the 
pointer type).

Attributes only would check properness of pointer value 
assignments without code duplication of the function as `auto 
ref` is doing. (One can still interprete it as part of the type.)

On Tuesday, 1 June 2021 at 06:12:05 UTC, Ola Fosheim Grostad 
wrote:
 lifetime region with equal lifetime. The comparison between 
 stack addresses assumes that an address deeper in the stack 
 has a higher or equal lifetime. The caller could also provide 
 it's stack frame bounds which allows to consider this interval 
 as one single lifetime.

 How about coroutines? Now you have multiple stacks.

Thanks, I missed that, at least true coroutines have. Other 
things also can dissect stack-frame memory (function-specific 
allocators in the stack-region). But in our case, it's already a 
question whether such special stack frames still should be 
allocated in the stack-region, statically (as I implemented it 
once for C) or in a heap region (like stack frames of 
continuations). You could at least place coroutine stack frames 
in some allocator region in static memory.

A probably less fragile but more costly solution (when checking 
stack-addresses) for stack address scope would be storing the 
stack depth of an address in the upper k-bit portion of a wide 
pointer value (for a simple check) but this is only a further 
unrelated idea.

 Dynamic checks are unlikely to be accepted, I suggest you do 
 this as a library.

Right, if nobody tried it so far I'd like myself. Then I can firm 
my D experience with further practice. I'd compare the nature of 
static and dynamic attribute checks to the nature of C++ 
`static_cast` and `dynamic_cast` of class pointers. I was 
thinking, such a user library could use `__traits` with templated 
operator overloads.

 Where the feature shines most is function signatures because 
 they separate code and create intransparency which can be 
 countered by memory attributes for return type and argument 
 types.

 Unfortunately, this is also why it will be rejected.

So, is that D's tenor that function signatures are thought to 
create *in*transparency and should continue to do so? Does the 
community think, allocation and memory transparency is a bad 
thing or just not needed? IMO, allocation and memory transparency 
is relevant to being a serious Systems programming language (even 

Systems Programming :-D ). Isn't the missing memory transparency 
from outside of functions the reason why global variables are 
frowned upon by many? Related to referential transparency (side 
effects), less transparency makes programs harder to debug, 
decouple and APIs harder to use right. (Just the single `map` 
issue with fixed-size arrays...)

 Okay, I didn't define aliasing. With "aliasing" I mean that 
 "aliasing references" (or pointers) either point to the exact 
 same address or that the immediately pointed class/struct 
 (pointed to by the reference/pointer) does not overlap. I 
 would consider anything else more complicated than necessary.

 Insufficient for D with library container types and library 
 smart pointers.

Yeah. It makes no sense if we consider the pointer layers between 
the exposed pointer and the actual data (I assume, smart pointers 
in D are implemented with such a middle layer in between). But if 
it only means the first payload data layer that represents the 
actual root node of any graph-like data structure, is it still 
flawed? At least, if I can annotate all pointer variables in my 
data structures and if checks are done for every single 
reference/pointer assignment with any access so that no pointer 
value range in the entire structure ever becomes violated, isn't 
it closer to memory safety than without? Of course, I could still 
pass references to those pointers to a binary which write into it 
without knowing any type information but that's a deliberate risk 
which static type checking cannot mitigate, only dynamic value 
checking of the pointed data after function return. (Probably 
another useful safety feature for my idea.)

Of course attributes are optional, nobody has to annotate 
anything with the risk of obtaining falsely scoped pointer values.

But would you agree, it would be better than not having it? Of 
course, it doesn't make everything safe, particularly if one can 
omit it but annotating variables with attributes could help with 
ownership (I think in a better design than Walter's proposal of 
yet another function attribute  live instead of a variable 
attribute). With ownership I mean to prevent leakage of 
(sensible) data out of a function (not just reference values as 
with `scope`) and could provide some sanity checks and even 
provide more transparency for API use (because then I can see 
what kind of allocated memory I can expect for parameters and 
return value). I think, it could improve interfacing with C++ as 
well.
At the end, I only want certainty about the references and 
pointers when I look into a function signature.

I probably should (try to) implement it myself as a proof of 
concept.

Regards, Elmar

Jun 03 2021

sighoya <sighoya gmail.com> writes:

On Thursday, 3 June 2021 at 17:26:03 UTC, Elmar wrote:
```D
 stack arr2 = dynamicArr.dup;   // create a copy on stack, the 
stack is "scope"d
```
An easy but probably limited implementation would set 
theAllocator before the initialization of such an attributed 
value-type variable and resets theAllocator afterwards to the 
allocator from before.


What if dup is creating things on the heap (I don't know by the 
way). You need to make the allocator dynamically scoped.

```D
 register: only stored in a register (with compile-time error if 
not possible)
```

Only if you have complete control over the backend.

Beside the combinatorial explosion in the required logic to check 
for, what happens if we copy/moving data between different memory 
annotated variables, e.g. nogc to gc, newcpp to gc.
Did we auto copy, cast or throw an error. If we do not throw an 
error, an annotation might not only restrict access but also 
change semantics by introducing new references.
So annotations become implied actions, that can be ok but is 
eventually hard to accept for the current uses of annotations.


Does the community think, allocation and memory transparency is 
a bad thing or just not needed? IMO, allocation and memory 
transparency is relevant to being a serious Systems programming 
language (even though C doesn't have it, C++ doesn't have it and 


There is no such thing as memory transparency, strictly speaking, 
even if you want to allocate things on the stack, what is if your 
backend doesn't have a stack at all? Or we just rename the heap 
to stack?

In the end, we aren't that better what C or high level languages 
do, we have some heuristic though that our structures map better 
to the underlying hardware.

As a C programmer I'd say that C's pointer concept was never 
needed as it stands, it just was created to be an unsafe 
reference variable + a reference + an iterator 
all-in-one-solution as the simplest generic thing which beats it 
all (without knowing the use case by looking at the pointer 
type).


Well, I think having both is problematic/complex. But C has only 
one of those and C++ has both.
It's not quite correct what arrays belong, so that's a mistake.


I think, any new language should remove the concept of pointers 
entirely rather than introducing new pointers.

Why not removing the distinction between values and 
references/pointers at all? But I think it drifts to hard in a 
logically high level language and isn't the right way to go in a 
system level language although very interesting.

Annotations seam to be neat, but they parametrize your code:

```D
 allocator("X"),  lifetime("param1", "greather", "param2") void 
f(Type1 param1, Type2 param2)
```

becomes

```D
void f(Allocator X,Lifetime lifetime(param1), Lifetime 
lifetime(param2))(Type1 param1, Type2 param2) if 
currentAllocator=X && lifetime(param1)>=lifetime(param2) {...}
```

which literally turns every function allocating something into a 
template increasing "templatism" unless we get runtime generics 
as Swift.

To summarize, I find these improvements interesting, but
- doesn't feel system level
- at all possible in a 20 years old language?

Some Info: Rust distracts me in the point being a mix of high 
level with low level. They have values and references for their 
ownership/borrowing system but then also custom pointer types 
which doesn't interact well with the former.

Jun 03 2021

Elmar <chrehme gmx.de> writes:

Thank you for your input.


On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
 Beside the combinatorial explosion in the required logic to 
 check for, what happens if we copy/moving data between 
 different memory annotated variables, e.g. nogc to gc, newcpp 
 to gc.
 Did we auto copy, cast or throw an error. If we do not throw an 
 error, an annotation might not only restrict access but also 
 change semantics by introducing new references.
 So annotations become implied actions, that can be ok but is 
 eventually hard to accept for the current uses of annotations.

There is no combinatorial explosion, that would be a bad idea ;-).

Annotated references behave like a super class of non-annotated 
references or, say, a subset of attributes is a super class of a 
superset of attributes. The best effect description (in the 
dynamic case) would be viewing memory attributes like a 
precondition which requires the address value to be in certain 
interval(s). Currently attributes only have compile-time 
semantics, you said, so a static check would fit, right?


On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
 There is no such thing as memory transparency, strictly 
 speaking, even if you want to allocate things on the stack, 
 what is if your backend doesn't have a stack at all? Or we just 
 rename the heap to stack?

Okay, "memory transparency" is a bad name. It could seem that it 
reveals actual memory addresses. I mean "allocation" or "scope" 
transparency.

Concerning the call stack: Languages which don't provide the 
abstraction of scoped variables (which is implemented by a (call) 
stack) basically only have global variables + registers. 
Currently, I'm not conscious about a high-level language nor 
processor which wouldn't support that abstraction because it's 
the most basic abstraction of any high-level language. If you 
have "functions" then you also have a call stack or let's call it 
"automatic scope". It doesn't matter, whether automatic scope is 
allocated in heap area (which can happen with closures and 
continuations), static memory area (for non-recursive functions) 
or in its own area at the end of the memory layout, it only 
matters that it's automatically managed by the function. I also 
think that CPUs which don't support a call stack cannot be 
programmed with D at all.

If attributes are used with static checks, it will not care about 
the actual memory address value, only the location in source code 
where a value was allocated or about the attributes which it gets 
from the user. The automatic lifetime is the criterion to 
distinguish it from heap, GC or static memory.

For dynamic checks, I indeed made an assumption, that in real 
programs actual lifetime/scope can be inferred from memory 
addresses because allocation regions of related scope usually put 
variables in common memory areas (at least in common memory 
segments). This would result in pointer types to be value ranges 
instead of unconstrained 32-bit integers. Ultimately, information 
from a linker script could be needed for authentic dynamic checks 
(using relocated address for checking). I could imagine this to 
be difficult on top.

Data from stack frames in the heap would be treated as 
dynamically allocated and data from static stack frames would be 
treated as stack. This could lead to unexpected results, false 
errors, unless more information is passed with the pointer. 
Dynamic checks would require a separate implementation (separate 
type) which memorizes in some bits which allocation scope a value 
was created. Eventually, the dynamic solution is less lightweight 
in memory but it makes the value check easier.


On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
 Well, I think having both is problematic/complex. But C has 
 only one of those and C++ has both.
 It's not quite correct what arrays belong, so that's a mistake.

You mean references and pointers right? References (from C++) are 
immutable pointers (in theory). C++ has pointers for backwards 
compatibility (and probably because the designer originally 
didn't understand the problem) but are now discouraged from being 
used as "raw pointers" (when I wrote "pointer" I mean "raw 
pointer").

(Raw) Pointers instead are modifiable "reference variables" (like 
the variables in Java) which additionally provide access to the 
pointer address and allow modifying it. Reference variables 
however don't allow casting to non-pointer types.

Arrays in C and C++ are actually more like C++ references, i.e. 
(locally) immutable pointers.


On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
 Annotations seam to be neat, but they parametrize your code:

 ```D
  allocator("X"),  lifetime("param1", "greather", "param2") void 
 f(Type1 param1, Type2 param2)
 ```

 becomes

 ```D
 void f(Allocator X,Lifetime lifetime(param1), Lifetime 
 lifetime(param2))(Type1 param1, Type2 param2) if 
 currentAllocator=X && lifetime(param1)>=lifetime(param2) {...}
 ```

 which literally turns every function allocating something into 
 a template increasing "templatism" unless we get runtime 
 generics as Swift.


I agree that templatism is bad.

Are attributes really lowered to template-arguments by the 
compiler? I also didn't mean to introduce new syntax with a comma 
between attributes. With memory attributes I really mean 
attributes like `scope`, `ref`, `private`, `pure`, ` nogc` ... 
which are used with reference/pointer types, not functions. You 
would be right that any assignment operation to an annotated 
reference needs a templated overload. I can't think of another 
way how to implement it. In the worst case, it would become 
something like


```D
Ref!(nogc, Flower) tulip;       // anything but not allocated by 
garbage collector
Ref!(static, new, Bird) raven;  // no automatic allocation
```

I would already be happy with the most important attributes.

- "Oh, I see, it returns me GC-allocated memory"
- "Oh, the passed argument is allocated automatically, so I can't 
put the address into a static reference."
- "Oh, a slice over a fixed-size array will not work with that 
function."

---

Of course, the amount of safety to get from these attributes 
depends on the programmer. For example they don't prevent 
Use-after-free with ` newc` and ` newcpp` in every case because 
it could be that a referenced value suddenly is deleted by code 
which interrupts the function execution. The true scope depends 
not only on the location of allocation but also on the location 
of associated deallocation. If the deallocation can happen in a 
code block, which interrupts normal function execution, than I 
would treat it either like `shared` or ` memory`. The compiler 
can't know all by itself. The memory safety thus will only work 
if the proper attributes are used by programmers.

But the fact, that D already implements a very small weaker 
subset of memory (or reference) attributes like `scope` and 
`return ref` shows, that this idea does fit to D's design.

---

PS:

On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
 Why not removing the distinction between values and 
 references/pointers at all? But I think it drifts to hard in a 
 logically high level language and isn't the right way to go in 
 a system level language although very interesting.

I'm getting offtopic but I totally agree with you! I have a 
System Programming language idea which treats every variable as a 
reference variable to get rid of the annoying value categories 
and value concept by using a unified variable access interface 
which allows for using different reference implementations for 
different optimization scenarios (like using registers to store 
and modify the referenced value).


On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
 On Thursday, 3 June 2021 at 17:26:03 UTC, Elmar wrote:
```D
 stack arr2 = dynamicArr.dup;   // create a copy on stack, the 
stack is "scope"d
```
 [...]


 What if dup is creating things on the heap (I don't know by the 
 way). You need to make the allocator dynamically scoped.

This was supposed to be a side idea unrelated to the main idea. 
`dup` does allocate memory with GC, so you'd be right when we 
talk about annotating references, but the snippet here is 
supposed to inject the annotated allocation of a *value* 
definition into the RHS, i.e. `dup`s internal implementation.

If you like I elaborate more on that idea:

Idea: when annotating value declarations it specifies where the 
return value or expression value of the RHS is allocated and 
thus, where the variable will be located in memory. This 
theoretical idea would give more control over the variable's 
allocation.

This idea is not odd because a small subset of such attributes 
for value variables IS already implemented in D, like 
static/global variables, automatic local variables (of course) 
and member scope variables in structs and classes. C also 
features ` memory` (`volatile`) and to some extend ` register` 
(C99's `restrict` is only close to it (which keeps pointer 
dereferenced values in registers for further dereferencing) or 
with language extensions to map variables to specific registers).

I thought, this would be not popular because it seems like D 
doesn't want to be too much an alternative for C++ Systems 
Programming and generalizing this concept seems like a bigger 
change. That's why this idea was only a side note.

```D
 gc short opal = 3;
// eqv. to  ref short opal = cast(ref short)GC.make!short(); opal 
= 5;
 newc int emerald = 5;
// eqv. to  ref int emerald = cast(ref int)malloc(int.sizeof); 
emerald = 5;
 new float ruby = 8.;
//  new uses the "new" operator, which is not always dynamic 
allocation
 rc int amethyst = 13;
// reference counted, basically an abstraction over an underlying 
shared pointer
...
free(&emerald);   // needed because  newc is not automatically 
managed
```

A benefit is that these variables still are used like values, 
i.e. they are passed by value or by reference depending on the 
function parameter type, although, physically, they are a 
reference of course (because everything is actually reference 
which is not stored in a register, variables on the call stack 
are referenced via the Stack Pointer for example).

Goal: The responsibility of allocation is shifted from the 
service, the callee, (which doesn't know about any concrete 
client's allocation needs) to the client, the caller, (which 
knows about it's own allocation needs and actually should know 
what it gets). GC has been introduced to remove the symptoms of 
this problem (memory management problems) without solving it 
(consequence: it gets used way more often than needed and is 
inefficient). The only way, it would be solved reasonably, is 
letting the caller side (LHS of assignment) deside what it needs, 
not the callee side (RHS of assignment), because the caller side 
has to handle it afterwards. A generic solution would be to use 
some kind of Dependency Injector which handles the allocation, 
uses the callee to initialize the value and passes it to the 
caller. It would turn those attributes into a powerful 
abstraction. A very easy implementation of the Dependency 
Injector is overriding `theAllocator` while the RHS is computed.

Jun 05 2021

sighoya <sighoya gmail.com> writes:

On Sunday, 6 June 2021 at 01:10:45 UTC, Elmar wrote:

 Goal: The responsibility of allocation is shifted from the 
 service, the callee, (which doesn't know about any concrete 
 client's allocation needs) to the client, the caller, (which 
 knows about it's own allocation needs and actually should know 
 what it gets).

That seems to very much resemble the idea odin strives for:

https://odin-lang.org/docs/overview/#implicit-context-system

It sounds very interesting and feels indeed system programming 
like, but I see some issues with that:

- the `new` call may not be the same for unique ptr, rc and gc as 
gc needs more context
- rc/arc may need to insert incs and decs to the begin and end of 
the scope respectively, but this would that means to templatize 
the callee or to box over a runtime option, both having draws
- overwriting allocators is an interesting concept, but how 
useful is this given that the code can't be re-adapted meaning 
not only the allocator make execution performant but also the 
code around it and both are intertwined to each other
- you have a small performance hit because of passing function 
pointer to the callee or by exchanging them when global vars are 
used for, it's akin to nondetermistic vs deterministic exception 
handling
- even more, how many custom allocators are passed to the callee 
because the callee as caller has many callees inside them which 
use allocation and all the custom allocators are propagating up.

I recognize that some minority is interested in this, even me to 
some kind. I'm still skeptical about the gain.
However, what is if you try to extend the compiler to see if it 
is possible, maybe you find other people to realize your idea.
Recalling myself, I know that russhy and IGotD are similar 
interested in custom allocators.

Jun 06 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Thursday, 3 June 2021 at 17:26:03 UTC, Elmar wrote:
 On Tuesday, 1 June 2021 at 06:12:05 UTC, Ola Fosheim Grostad 
 wrote:
 The separate compilation is a good point. Binary compatibility 
 is a common property considered for security safeguards. But at 
 least static checking with attributes would need no memory 
 addresses at all (also if the compiler can infer the attribute 
 for every value-typed variable automatically from where it is 
 defined).

I don't think separate compilation is a good point. I think a 
modern language should mix object-file and IR-file linking with 
caching for speed improvements.

Nevertheless, it is being used as an argument for not making D a 
better language. So, that is what you are up against.

D isn't really "modern". It is very much in the C-mold, like C++. 
It has taken on too many of C++'s flaws. For instance it kept 
underperforming exceptions instead of making them fast.

 passes addresses obtained from variables. As a C programmer I'd 
 say that C's pointer concept was never needed as it stands, it 
 just was created to be an unsafe reference variable + a 
 reference + an iterator all-in-one-solution as the simplest 
 generic thing which beats it all (without knowing the use case 
 by looking at the pointer type).

C is mostly an abstraction over common machine language 
instructions. Making a non-optimizing C backend perform 
reasonably well for handcrafted C-code.

C pointers do have a counterpart in C++ STL iterators though. So, 
one could argue that C-pointers are memory-iterators.

 Right, if nobody tried it so far I'd like myself. Then I can 
 firm my D experience with further practice. I'd compare the 
 nature of static and dynamic attribute checks to the nature of 
 C++ `static_cast` and `dynamic_cast` of class pointers. I was 
 thinking, such a user library could use `__traits` with 
 templated operator overloads.

Sounds like a fun project.

(D, as the languages stands, encourages the equivalent of 
reinterpret_cast, so there is that.)



 So, is that D's tenor that function signatures are thought to 
 create *in*transparency and should continue to do so? Does the 
 community think, allocation and memory transparency is a bad 
 thing or just not needed? IMO, allocation and memory 
 transparency is relevant to being a serious Systems programming 
 language

Let us not confuse community with creators. :). Also, let us not 
assume that there is a homogeneous community.

So, you have the scripty-camp who are not bothered by the current 
GC and don't really deal with memory allocations much. Then there 
is the other camp.

As one of those in the other camp, I think that the compiler 
should do the memory management and be free to optimize. So I am 
not fond of things like "scope". I think they are crutches. I 
think the language is becoming arcane by patching it up here and 
there instead of providing a generic solution.

 I probably should (try to) implement it myself as a proof of 
 concept.

The best option is to just introduce a custom pointer-library, 
like in C++, that tracks what you want it to track.

Don't bother with separate compilation issues. Just template all 
functions. I think LDC will remove duplicates if the bodies of 
two functions turn into the same machine code?

Then you get a feeling for what it would be like.

Jun 04 2021

Elmar <chrehme gmx.de> writes:

On Friday, 4 June 2021 at 08:29:47 UTC, Ola Fosheim Grøstad wrote:
 I don't think separate compilation is a good point. I think a 
 modern language should mix object-file and IR-file linking with 
 caching for speed improvements.

I know what you mean. Avoiding separate compilation where 
possible allows for more optimization potential but in general 
you can't avoid it, particularly when source code is not 
available. D tries to be linkable with C and C++ which don't know 
D. That's why working with separate binaries needs to be 
considered (and compatibility with C/C++ really is important 
because both languages have large code bases). So with "good 
point" I meant "an important aspect" in research, not that it's 
nice to need it :-) .

 C pointers do have a counterpart in C++ STL iterators though. 
 So, one could argue that C-pointers are memory-iterators.

Exactly.

 So, you have the scripty-camp who are not bothered by the 
 current GC and don't really deal with memory allocations much. 
 Then there is the other camp.

 As one of those in the other camp, I think that the compiler 
 should do the memory management and be free to optimize. So I 
 am not fond of things like "scope". I think they are crutches. 
 I think the language is becoming arcane by patching it up here 
 and there instead of providing a generic solution.

`scope` exists for a good reason because guaranteeing compiler 
optimization is very hard (apparently). The `scope` attribute has 
two benefits: 1st it makes RAII explicit (which helps to prevent 
bugs by falsely using it), 2nd it makes the compiler faster and 
easier because it doesn't need to understand the code in order to 
search for optimization places.

Performing optimization is easier than finding the places that 
can be optimized and then assuring the correctness of 
optimization. Finding all optimizations automatically (the ideal 
case) probably would explode compile-time so therefore attributes 
are given to programmers for hinting them, it solves the 
compile-time issue and reduces compiler complexity a lot.

The problem with genericity is, it's always a tradeoff with 
efficiency. It makes coding more complicated and the solution 
more heavy (because it takes care about more cases), at some 
point too complicated to be practical.

There are cases were the compiler cannot know in any way the 
correctness of an optimization because it depends on the 
programmer's intend (or would change behaviour). Then, explicit 
optimization hints are required.

 Don't bother with separate compilation issues. Just template 
 all functions. I think LDC will remove duplicates if the bodies 
 of two functions turn into the same machine code?

I'm almost certain that duplicate functions will not be removed 
in all cases, because that's a very difficult problem to solve. 
It just reminds me of the undecidability of the Liskov 
Substitution principle according to Wikipedia. This principle 
requires that contracts/halting still hold with subtype arguments 
passed to a function.

Have a nice day.

Jun 05 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 5 June 2021 at 17:55:26 UTC, Elmar wrote:
 `scope` exists for a good reason because guaranteeing compiler 
 optimization is very hard (apparently). The `scope` attribute 
 has two benefits: 1st it makes RAII explicit (which helps to 
 prevent bugs by falsely using it), 2nd it makes the compiler 
 faster and easier because it doesn't need to understand the 
 code in order to search for optimization places.

But introducing all these special cases just to avoid explicit 
lifetimes like Rust is making the language more complicated, not 
less.

The intention is to make it less complicated, but that will not 
be the end result, I think.

I don't think one can evolve a solution. It has to be designed as 
a whole, not one piece at a time like D is doing now.

Jun 05 2021

D Programming

C/C++ Programming

Other

digitalmars.D - Memory/Allocation attributes for variables