digitalmars.D.learn - GC.addRange in pure function

vitamin (2/2) Feb 07 2021 Why using 'new' is allowed in pure functions but calling

frame (4/6) Feb 08 2021 Does 'new' violate the 'pure' paradigm? Pure functions can only

rm (3/9) Feb 10 2021 new allocates memory via the GC and the GC knows to scan this location.

vit (5/15) Feb 10 2021 Yes, this is my problem, if `new` can create object in pure

Petar Kirov [ZombineDev] (138/155) Feb 10 2021 TL;DR Yes, you can, but it depends on what "without problem"

Petar Kirov [ZombineDev] (17/18) Feb 10 2021 A few practical examples:
vitamin (9/14) Feb 12 2021 Thanks,

Petar Kirov [ZombineDev] (2/18) Feb 12 2021 Great, that's the exact idea!

Per =?UTF-8?B?Tm9yZGzDtnc=?= (5/8) Feb 12 2021 Would making

Petar Kirov [ZombineDev] (14/22) Feb 12 2021 `GC.addRange` is only used for memory allocated outside of the GC

Temtaime (2/4) Feb 09 2021 pure is broken. Just don't [use it]

Max Haughton (2/6) Feb 09 2021 [Citation needed]

Paul Backus (4/11) Feb 09 2021 Allowing memory allocation in pure code in a language that can

Dominikus Dittes Scherkl (4/9) Feb 10 2021 pure in D is a very useful concept, even if it's not literally

vitamin <vit vit.vit> writes:

Why using 'new' is allowed in pure functions but calling 
GC.addRange or GC.removeRange isn't allowed?

Feb 07 2021

frame <frame86 live.com> writes:

On Sunday, 7 February 2021 at 14:13:18 UTC, vitamin wrote:
 Why using 'new' is allowed in pure functions but calling 
 GC.addRange or GC.removeRange isn't allowed?

Does 'new' violate the 'pure' paradigm? Pure functions can only 
call pure functions and GC.addRange or GC.removeRange is only 
'nothrow  nogc'.

Feb 08 2021

rm <rymrg memail.com> writes:

On 09/02/2021 5:05, frame wrote:
 On Sunday, 7 February 2021 at 14:13:18 UTC, vitamin wrote:
 Why using 'new' is allowed in pure functions but calling GC.addRange 
 or GC.removeRange isn't allowed?

 
 Does 'new' violate the 'pure' paradigm? Pure functions can only call 
 pure functions and GC.addRange or GC.removeRange is only 'nothrow  nogc'.

new allocates memory via the GC and the GC knows to scan this location. 
Seems like implicit GC.addRange.

Feb 10 2021

vit <vit vit.vit> writes:

On Wednesday, 10 February 2021 at 12:17:43 UTC, rm wrote:
 On 09/02/2021 5:05, frame wrote:
 On Sunday, 7 February 2021 at 14:13:18 UTC, vitamin wrote:
 Why using 'new' is allowed in pure functions but calling 
 GC.addRange or GC.removeRange isn't allowed?

 
 Does 'new' violate the 'pure' paradigm? Pure functions can 
 only call pure functions and GC.addRange or GC.removeRange is 
 only 'nothrow  nogc'.

 new allocates memory via the GC and the GC knows to scan this 
 location. Seems like implicit GC.addRange.

Yes, this is my problem, if `new` can create object in pure 
function, then GC.addRange and GC.removeRange is may be pure too.

Can I call GC.addRange and GC.removeRange from pure function 
without problem? (using assumePure(...)() ).

Feb 10 2021

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Wednesday, 10 February 2021 at 13:44:53 UTC, vit wrote:
 On Wednesday, 10 February 2021 at 12:17:43 UTC, rm wrote:
 On 09/02/2021 5:05, frame wrote:
 On Sunday, 7 February 2021 at 14:13:18 UTC, vitamin wrote:
 Why using 'new' is allowed in pure functions but calling 
 GC.addRange or GC.removeRange isn't allowed?

 
 Does 'new' violate the 'pure' paradigm? Pure functions can 
 only call pure functions and GC.addRange or GC.removeRange is 
 only 'nothrow  nogc'.

 new allocates memory via the GC and the GC knows to scan this 
 location. Seems like implicit GC.addRange.

 Yes, this is my problem, if `new` can create object in pure 
 function, then GC.addRange and GC.removeRange is may be pure 
 too.

 Can I call GC.addRange and GC.removeRange from pure function 
 without problem? (using assumePure(...)() ).

TL;DR Yes, you can, but it depends on what "without problem" 
means for you :P



===================================

According to D's general approach to purity, malloc/free/GC.* are 
indeed impure as they read and write global **mutable** state, 
but are still allowed in pure functions **if encapsulated 
properly**. The encapsulation is done by  trusted wrappers which 
must be carefully audited by humans - the compiler can't help you 
with that.

The general rule that you must follow for such 
*callable-from-pure* code (technically it is labeled as `pure`, 
e.g.:

     pragma(mangle, "malloc") pure  system  nogc nothrow
     void* fakePureMalloc(size_t);

but I prefer to make the conceptual distinction) is that the 
effect of calling the  trusted wrapper must not drastically leak 
/ be observed.

What "drastically" means depends on what you want `pure` to mean 
in your application. Which side-effects you want to protect 
against by using `pure`? It is really a high-level concern that 
you as a developer must decide on when writing/using  trusted 
pure code in your program. For example, generally everyone will 
agree that network calls are impure. But what about logging? It's 
impure by definition, since it mutates a global log stream. But 
is this effect worth caring about? In some specific situations it 
maybe ok to ignore. This is why in D you can call `writeln` in 
`pure` functions, as long as it's inside a `debug` block. But 
given that you as a developer can decide whether to pass `-debug` 
option to the compiler, essentially you're in control of what 
`pure` means for your codebase, at least to some extent.

100% mathematical purity is impossible even in the most strict 
functional programming language implementations, since our 
programs run on actual hardware and not on an idealized 
mathematical machine. For example, even the act of reading 
immutable data can be globally observed as by measuring the 
memory access times - see Spectre [1] and all other 
microarchitecture side-channel [1] vulnerabilities.

[1]: 
https://en.wikipedia.org/wiki/Spectre_(security_vulnerability)
[2]: https://en.wikipedia.org/wiki/Side-channel_attack

That said, function purity is not useless at all, quite the 
contrary. It is about making your programs more deterministic and 
easy to reason about. We all want less bugs in our code and less 
time spent chasing hard to reproduce crashes, right?

`pure` is really about limiting, containing / compartmentalizing 
and controlling the the (in-deterministic) global effects in your 
program. Ideally you should strive to structure your programs as 
a pure core, driven by an imperative, impure shell. E.g. if 
you're working on an accounting application, the core is the part 
that implements the main domain / business logic and should be 
100% deterministic and pure. The imperative shell is the part 
that reads spreadsheet files, exports to pdf, etc. (actually just 
the actual file I/O needs to be impure - the actual decoding / 
encoding of data structures can be perfectly pure).


Now, back to practice and the question of memory management.

Of course allocating memory is globally observable effect and 
even locally one can compare pointers, as Paul Backus mentioned, 
as D is a systems language. However, as a practical concession, 
D's concept of pure-ity is about ensuring high-level invariants 
and so such low-level concerns can be ignored, as long as the 
codebase doesn't observe them. What does it mean to observe them? 
Here's an example:

---
void main()
{
     import std.stdio : writeln;
     observingLowLevelSideEffects.writeln; // `false`, but could 
be `true`
     notObservingSideEffects.writeln; // always `true`
}

// BAD:
bool observingLowLevelSideEffects() pure
{
     immutable a = [2];
     immutable b = [2];
     return a.ptr == b.ptr;
}

// OK
bool notObservingSideEffects() pure
{
     immutable a = [2];
     immutable b = [2];
     return a == b;
}
---

`observingLowLevelSideEffects` is bad, as according to the 
language rules, the compiler is free to make `a` and `b` point to 
the same immutable array, the result of the function is 
implementation defined (or worse unspecified), which exactly what 
purity should help us avoid. If `observingLowLevelSideEffects` 
was not marked as `pure` it wouldn't be "BAD", just "meh". In 
contrast, `notObservingSideEffects` is "OK", even though 
ironically the implementation of array equality first compares 
the pointers. So, `notObservingSideEffects` is basically doing 
the same as `observingLowLevelSideEffects` plus some extra code.

So it's really just a question of whether the side-effects can be 
observed.

If in order to perform some calculation a function allocated some 
temporary memory on the heap, but then freed it once it was done, 
would else someone care? If you're on a micro controller with 
very limited memory then yes, but otherwise probably no.
And what if the function didn't allocate any additional memory? 
And what if the function is memoized (i.e. it caches the result 
of the calculation for some set of arguments)? If the cache was 
shared by all threads and protected by a mutex it could be a 
problem. Especially if the code locks the mutex while the 
function is executing, but then the function proceeds to acquire 
another mutex - it starts smell like a deadlock possibility. But 
what if the cache was just thread-local - surely this must be 
better? The answer is "yes", even though as far as the language 
is concerned whether a global mutable variable is thread-local or 
`shared` doesn't matter for function purity. But one is obviously 
more deterministic than the other, even though hard to quantify.

So a good heuristic is that the more a side-effect is localized 
and controlled, the easier it would be to argue that the code is 
pure, as far as your application is concerned.

--------

Okay, but really what about `GC.addRange` and `GC.removeRange`?

The litmus test is whether the side-effects are controlled, i.e. 
whether your code has strong exception-safety [3][4][5], 
transactional semantics, ... or in other words what happens 
inside it stays inside it.

[3]: 
https://docs.microsoft.com/en-us/cpp/cpp/how-to-design-for-exception-safety?view=msvc-160#strong-guarantee
[4]: https://www.stroustrup.com/except.pdf
[5]: https://www.boost.org/community/exception_safety.html

So if you're implementing an RAII container, then yes, you can 
mark its functions as `pure`, as the destructor will unwind the 
side effects, so at least at a high-level whether GC.addRange / 
GC.removeRange were called is not observable.
Even more, your container was pure, but you forgot to add calls 
to GC.addRange/removeRange, and you stored references to 
GC-allocated data inside those ranges, the use-after-free bugs 
would surely be drastically observable, even if it occurs rarely, 
so well-placed calls to `GC.addRange/removeRange` can make your 
code more "pure", even if not `pure` :D

Feb 10 2021

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Wednesday, 10 February 2021 at 16:25:44 UTC, Petar Kirov 
[ZombineDev] wrote:
 [..]

A few practical examples:

Here it is deemed that the only observable side-effect of 
`malloc` and friends is the setting of `errno` in case of 
failure, so these wrappers ensure that this is not observed. 
Surely there are low-level ways to observe it (and also the act 
of allocating / deallocating memory on the C heap), but this 
definition purity what the standard library has decided it was 
reasonable:
https://github.com/dlang/druntime/blob/master/src/core/memory.d#L1082-L1150

These two function calls in Array.~this() can be marked as 
`pure`, as the Array type as a whole implements the RAII design 
pattern and offers at least basic exception-safety guarantees:
https://github.com/dlang/phobos/blob/81a968dee68728f7ea245b6983eb7236fb3b2981/std/container/array.d#L296-L298

(The whole function is not marked pure, as the purity depends on 
the purity of the destructor of the template type parameter `T`.)

Feb 10 2021

vitamin <vit vit.vit> writes:

On Wednesday, 10 February 2021 at 16:25:44 UTC, Petar Kirov 
[ZombineDev] wrote:
 On Wednesday, 10 February 2021 at 13:44:53 UTC, vit wrote:
 [...]

 TL;DR Yes, you can, but it depends on what "without problem" 
 means for you :P

 [...]

Thanks,

Yes, I am implementing container (ref counted pointer). When 
allcoator is Mallcoator (pure allocate and deallocate) and 
constructor of Type inside rc pointer has pure constructor and 
destructor, then only impure calls was GC.addRange and 
GC.removeRange.
Now there are marked as pure.

Feb 12 2021

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Friday, 12 February 2021 at 19:48:01 UTC, vitamin wrote:
 On Wednesday, 10 February 2021 at 16:25:44 UTC, Petar Kirov 
 [ZombineDev] wrote:
 On Wednesday, 10 February 2021 at 13:44:53 UTC, vit wrote:
 [...]

 TL;DR Yes, you can, but it depends on what "without problem" 
 means for you :P

 [...]

 Thanks,

 Yes, I am implementing container (ref counted pointer). When 
 allcoator is Mallcoator (pure allocate and deallocate) and 
 constructor of Type inside rc pointer has pure constructor and 
 destructor, then only impure calls was GC.addRange and 
 GC.removeRange.
 Now there are marked as pure.

Great, that's the exact idea!

Feb 12 2021

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Tuesday, 9 February 2021 at 03:05:10 UTC, frame wrote:
 On Sunday, 7 February 2021 at 14:13:18 UTC, vitamin wrote:
 Why using 'new' is allowed in pure functions but calling 
 GC.addRange or GC.removeRange isn't allowed?


Would making

`new T[]` inject a call to `GC.addRange` based on `T` (and maybe 
also T's attributes)

be a step forward?

Feb 12 2021

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Friday, 12 February 2021 at 12:17:13 UTC, Per Nordlöw wrote:
 On Tuesday, 9 February 2021 at 03:05:10 UTC, frame wrote:
 On Sunday, 7 February 2021 at 14:13:18 UTC, vitamin wrote:
 Why using 'new' is allowed in pure functions but calling 
 GC.addRange or GC.removeRange isn't allowed?


 Would making

 `new T[]` inject a call to `GC.addRange` based on `T` (and 
 maybe also T's attributes)

 be a step forward?

`GC.addRange` is only used for memory allocated outside of the GC 
that can hold references to GC allocated objects. Since `new T[]` 
uses the GC, all the information is typeinfo is already there 
(*), so `GC.addRange` is unnecessary and even wrong, because when 
the GC collects the memory it won't call `GC.removeRange` on it

Implementation-wise, metadata about GC-allocated memory is held 
in the GC internal data structures, whereas the GC roots and 
ranges are stored in separate malloc/free-managed containers.

(*) Currently `new T[]` is lowered to an `extern (C)` runtime 
hook and the compiler passes to it typeid(T). After this the call 
chain is: _d_newarray_d_newarray{T,iT,mTX,miTX} -> _d_newarrayU 
-> __arrayAlloc -> GC.qalloc -> ConservativeGC.mallocNoSync -> 
Gcx.alloc -> {small,big}Alloc -> setBits

Feb 12 2021

Temtaime <temtaime gmail.com> writes:

On Sunday, 7 February 2021 at 14:13:18 UTC, vitamin wrote:
 Why using 'new' is allowed in pure functions but calling 
 GC.addRange or GC.removeRange isn't allowed?

pure is broken. Just don't [use it]

Feb 09 2021

Max Haughton <maxhaton gmail.com> writes:

On Tuesday, 9 February 2021 at 19:53:27 UTC, Temtaime wrote:
 On Sunday, 7 February 2021 at 14:13:18 UTC, vitamin wrote:
 Why using 'new' is allowed in pure functions but calling 
 GC.addRange or GC.removeRange isn't allowed?

 pure is broken. Just don't [use it]


[Citation needed]

Feb 09 2021

Paul Backus <snarwin gmail.com> writes:

On Tuesday, 9 February 2021 at 20:50:12 UTC, Max Haughton wrote:
 On Tuesday, 9 February 2021 at 19:53:27 UTC, Temtaime wrote:
 On Sunday, 7 February 2021 at 14:13:18 UTC, vitamin wrote:
 Why using 'new' is allowed in pure functions but calling 
 GC.addRange or GC.removeRange isn't allowed?

 pure is broken. Just don't [use it]


 [Citation needed]

Allowing memory allocation in pure code in a language that can 
distinguish between pointer equality and value equality is, let's 
say, "unprincipled."

Feb 09 2021

Dominikus Dittes Scherkl <dominikus scherkl.de> writes:

On Tuesday, 9 February 2021 at 21:00:39 UTC, Paul Backus wrote:
 On Tuesday, 9 February 2021 at 19:53:27 UTC, Temtaime wrote:
 pure is broken. Just don't [use it]


 Allowing memory allocation in pure code in a language that can 
 distinguish between pointer equality and value equality is, 
 let's say, "unprincipled."

pure in D is a very useful concept, even if it's not literally 
the same as pure in functional languages. Recommending not to use 
it is bad advice IMHO.

Feb 10 2021

D Programming

C/C++ Programming

Other

digitalmars.D.learn - GC.addRange in pure function