www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Actual immutability enforcement by placing immutable data into

reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
Right now D compilers place string literals into a read-only 
section, but most of the other types of `static immutable` data 
have no protection against rogue writes.

https://forum.dlang.org/post/cmtaeuedmdwxjecpcrjh forum.dlang.org 
is an example of a non-obvious case of immutable data corruption. 
What's happening there is that [druntime 
modifies](https://github.com/dlang/dmd/blob/v2.101.1/druntime/src/rt/deh.d#L46)
the static immutable instance of Exception when throwing it.

The old bugreport https://issues.dlang.org/show_bug.cgi?id=12118 
is also related to throwing an immutable Exception, but the 
corruption is done by the user code in the catch block.

Troubleshooting such problems would have been so much easier if 
immutable objects were actually placed in a read-only section and 
any write attempts triggered segfaults at runtime. I think that 
[bare metal code for 
microcontrollers](https://forum.dlang.org/post/rkrpdgjnhwdysqnnb
lf forum.dlang.org) could also potentially benefit from this, because this
would enable placing immutable data generated by CTFE into NOR flash instead of
wasting SRAM space.

What do you think about it? Does this require a new DIP?
Dec 19 2022
next sibling parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Monday, 19 December 2022 at 12:13:08 UTC, Siarhei Siamashka 
wrote:
 What do you think about it? Does this require a new DIP?
BTW, I tried to experiment with DMD code and can place some array literals into a read-only section: https://github.com/ssvb/dmd/commit/44c3a7c312b042fa7fafd357775aedf904ba0700 But much more seems to be needed to get it right. Nested array literals, such as the `[1,2]` part of `[[1,2],[3,4]]`, don't seem to have the immutable flag set when checked from https://github.com/dlang/dmd/blob/v2.101.1/compiler/src/dmd/todt.d#L456-L490 Additionally, the immutable flag seems to be stripped at https://github.com/dlang/dmd/blob/v2.101.1/compiler/src/dmd/tocsym.d#L189-L219 from immutable class and struct instances if they have constructors. But does this really matter for the data generated by CTFE? Detecting whether the data was generated by CTFE also doesn't seem to be very obvious. I tried to check the `.ownedByCtfe` field, but I'm getting strange results. Can anyone give me some hints?
Dec 19 2022
prev sibling parent reply bauss <jacobbauss gmail.com> writes:
On Monday, 19 December 2022 at 12:13:08 UTC, Siarhei Siamashka 
wrote:
 Right now D compilers place string literals into a read-only 
 section, but most of the other types of `static immutable` data 
 have no protection against rogue writes.

 https://forum.dlang.org/post/cmtaeuedmdwxjecpcrjh forum.dlang.org is an
example of a non-obvious case of immutable data corruption. What's happening
there is that [druntime
modifies](https://github.com/dlang/dmd/blob/v2.101.1/druntime/src/rt/deh.d#L46)
the static immutable instance of Exception when throwing it.

 The old bugreport 
 https://issues.dlang.org/show_bug.cgi?id=12118 is also related 
 to throwing an immutable Exception, but the corruption is done 
 by the user code in the catch block.

 Troubleshooting such problems would have been so much easier if 
 immutable objects were actually placed in a read-only section 
 and any write attempts triggered segfaults at runtime. I think 
 that [bare metal code for 
 microcontrollers](https://forum.dlang.org/post/rkrpdgjnhwdysqnnb
lf forum.dlang.org) could also potentially benefit from this, because this
would enable placing immutable data generated by CTFE into NOR flash instead of
wasting SRAM space.

 What do you think about it? Does this require a new DIP?
Isn't it going to be difficult to properly implement? Since you can't really place data into read-only memory, but you have to protect whole pages ex. VirtualProtect() on Windows. Esepcially with how immutable data can still be allocated through GC. Or am I not understanding something about this at all?
Dec 19 2022
next sibling parent reply bauss <jacobbauss gmail.com> writes:
On Monday, 19 December 2022 at 14:06:50 UTC, bauss wrote:
 On Monday, 19 December 2022 at 12:13:08 UTC, Siarhei Siamashka 
 wrote:
 Right now D compilers place string literals into a read-only 
 section, but most of the other types of `static immutable` 
 data have no protection against rogue writes.

 https://forum.dlang.org/post/cmtaeuedmdwxjecpcrjh forum.dlang.org is an
example of a non-obvious case of immutable data corruption. What's happening
there is that [druntime
modifies](https://github.com/dlang/dmd/blob/v2.101.1/druntime/src/rt/deh.d#L46)
the static immutable instance of Exception when throwing it.

 The old bugreport 
 https://issues.dlang.org/show_bug.cgi?id=12118 is also related 
 to throwing an immutable Exception, but the corruption is done 
 by the user code in the catch block.

 Troubleshooting such problems would have been so much easier 
 if immutable objects were actually placed in a read-only 
 section and any write attempts triggered segfaults at runtime. 
 I think that [bare metal code for 
 microcontrollers](https://forum.dlang.org/post/rkrpdgjnhwdysqnnb
lf forum.dlang.org) could also potentially benefit from this, because this
would enable placing immutable data generated by CTFE into NOR flash instead of
wasting SRAM space.

 What do you think about it? Does this require a new DIP?
Isn't it going to be difficult to properly implement? Since you can't really place data into read-only memory, but you have to protect whole pages ex. VirtualProtect() on Windows. Esepcially with how immutable data can still be allocated through GC. Or am I not understanding something about this at all?
Of course literals can be placed in read-only and should be, so in that case I think this would be good, BUT I don't think it's possible to really do for __all__ immutable data.
Dec 19 2022
parent reply IGotD- <nise nise.com> writes:
On Monday, 19 December 2022 at 14:07:49 UTC, bauss wrote:
 Of course literals can be placed in read-only and should be, so 
 in that case I think this would be good, BUT I don't think it's 
 possible to really do for __all__ immutable data.
https://dlang.org/articles/const-faq.html *What is immutable good for?* *Immutable data, once initialized, is never changed. This has many uses:* - *Access to immutable data need not be synchronized when multiple threads read it.* - *Data races, tearing, sequential consistency, and cache consistency are all non-issues when working with immutable data.* - *When doing a deep copy of a data structure, the immutable portions need not be copied. - Invariance allows a large chunk of data to be treated as a value type even if it is passed around by reference (strings are the most common case of this).* - *Immutable types provide more self-documenting information to the programmer.* - ***Immutable data can be placed in hardware protected read-only memory, or even in ROMs.*** - *If immutable data does change, it is a sure sign of a memory corruption bug, and it is possible to automatically check for such data integrity.* - *Immutable types provide for many program optimization opportunities.* *const acts as a bridge between the mutable and immutable worlds, so a single function can be used to accept both types of arguments.* I always interpreted immutable as something that must be constructed during compile time and put in the RO section of the program.
Dec 19 2022
parent reply bauss <jacobbauss gmail.com> writes:
On Monday, 19 December 2022 at 15:25:17 UTC, IGotD- wrote:
 On Monday, 19 December 2022 at 14:07:49 UTC, bauss wrote:
 Of course literals can be placed in read-only and should be, 
 so in that case I think this would be good, BUT I don't think 
 it's possible to really do for __all__ immutable data.
https://dlang.org/articles/const-faq.html *What is immutable good for?* *Immutable data, once initialized, is never changed. This has many uses:* - *Access to immutable data need not be synchronized when multiple threads read it.* - *Data races, tearing, sequential consistency, and cache consistency are all non-issues when working with immutable data.* - *When doing a deep copy of a data structure, the immutable portions need not be copied. - Invariance allows a large chunk of data to be treated as a value type even if it is passed around by reference (strings are the most common case of this).* - *Immutable types provide more self-documenting information to the programmer.* - ***Immutable data can be placed in hardware protected read-only memory, or even in ROMs.*** - *If immutable data does change, it is a sure sign of a memory corruption bug, and it is possible to automatically check for such data integrity.* - *Immutable types provide for many program optimization opportunities.* *const acts as a bridge between the mutable and immutable worlds, so a single function can be used to accept both types of arguments.* I always interpreted immutable as something that must be constructed during compile time and put in the RO section of the program.
Yes, but it's not the reality. Immutable data can be constructed at runtime and it happens all the time in shared static constructors etc. I think it would be a too big breaking change that you suddenly can't do that anymore. Ex. the following program is valid: ``` import std.stdio : writeln; import std.datetime : Clock; immutable int a; shared static this() { a = Clock.currTime().year; } void main() { writeln(a); } ``` In the above example "a" cannot be placed in read-only memory. Of course my example isn't something you would do in an every day program, BUT it could be substituted for values loaded from a file etc.
Dec 19 2022
next sibling parent reply IGotD- <nise nise.com> writes:
On Monday, 19 December 2022 at 15:34:35 UTC, bauss wrote:
 Yes, but it's not the reality. Immutable data can be 
 constructed at runtime and it happens all the time in shared 
 static constructors etc. I think it would be a too big breaking 
 change that you suddenly can't do that anymore.

 Ex. the following program is valid:

 ```
 import std.stdio : writeln;
 import std.datetime : Clock;

 immutable int a;

 shared static this()
 {
     a = Clock.currTime().year;
 }

 void main()
 {
     writeln(a);
 }
 ```

 In the above example "a" cannot be placed in read-only memory.

 Of course my example isn't something you would do in an every 
 day program, BUT it could be substituted for values loaded from 
 a file etc.
Couldn't D could just have used the 'const' keyword for such data.
Dec 19 2022
parent Nick Treleaven <nick geany.org> writes:
On Monday, 19 December 2022 at 15:52:29 UTC, IGotD- wrote:
 Couldn't D could just have used the 'const' keyword for such 
 data.
Then you wouldn't be able to share it across threads. Besides you can still do `new immutable int` at runtime so it is consistent.
Dec 21 2022
prev sibling parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Monday, 19 December 2022 at 15:34:35 UTC, bauss wrote:
 On Monday, 19 December 2022 at 15:25:17 UTC, IGotD- wrote:
 [...]
 I always interpreted immutable as something that must be 
 constructed during compile time and put in the RO section of 
 the program.
Yes, but it's not the reality. Immutable data can be constructed at runtime and it happens all the time in shared static constructors etc. I think it would be a too big breaking change that you suddenly can't do that anymore. Ex. the following program is valid: ```D import std.stdio : writeln; import std.datetime : Clock; immutable int a; shared static this() { a = Clock.currTime().year; } void main() { writeln(a); } ``` In the above example "a" cannot be placed in read-only memory.
The compiler will reject your constructor if you change "immutable int a;" to "immutable int a = 2030;": test.d(8): Error: cannot modify `immutable` expression `a` If a variable is both declared and initialized simultaneously, then it's probably safe to be placed into a read-only section. Please correct me if I'm wrong.
Dec 19 2022
prev sibling next sibling parent Tejas <notrealemail gmail.com> writes:
On Monday, 19 December 2022 at 14:06:50 UTC, bauss wrote:
 On Monday, 19 December 2022 at 12:13:08 UTC, Siarhei Siamashka 
 wrote:
 [...]
Isn't it going to be difficult to properly implement? Since you can't really place data into read-only memory, but you have to protect whole pages ex. VirtualProtect() on Windows. Esepcially with how immutable data can still be allocated through GC. Or am I not understanding something about this at all?
That is why he specified `static immutable` rather than `immutable` only
Dec 19 2022
prev sibling parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Monday, 19 December 2022 at 14:06:50 UTC, bauss wrote:
 What do you think about it? Does this require a new DIP?
Isn't it going to be difficult to properly implement? Since you can't really place data into read-only memory, but you have to protect whole pages ex. VirtualProtect() on Windows. Esepcially with how immutable data can still be allocated through GC. Or am I not understanding something about this at all?
I did mention static immutable and CTFE in my initial message. Some of the immutable data is generated at compile time and can safely go into read-only sections. Right now I'm only interested in trying to improve just this. But since you mentioned catching write accesses to the immutable data backed by GC allocations, this can be done with some help from extra tools or instrumentation. For example, I did use valgrind to debug the code from https://forum.dlang.org/post/cmtaeuedmdwxjecpcrjh forum.dlang.org ```C #include <stddef.h> #include <valgrind/memcheck.h> void vg_mark_block(void *p, size_t size) { int valgrind_handle = VALGRIND_CREATE_BLOCK(p, size, "MARKED BLOCK"); VALGRIND_MAKE_MEM_NOACCESS(p, size); } ``` ```D extern(C) void vg_mark_block(void *p, size_t size) nogc; void main() nogc { try { static immutable e = new Exception("test"); vg_mark_block(cast(void*)e, __traits(classInstanceSize, typeof(e))); throw e; } catch (Exception e) { assert(e.msg == "test"); } } ``` ``` ==3369== Invalid write of size 8 ==3369== at 0x4D5BEAE: _d_createTrace (in /usr/lib64/libphobos2.so.0.99.1) ==3369== by 0x4D5D4F9: _d_throwdwarf (in /usr/lib64/libphobos2.so.0.99.1) ==3369== by 0x1091C2: _Dmain (in /tmp/test/test) ==3369== by 0x4D5CEBE: void rt.dmain2._d_run_main2(char[][], ulong, extern (C) int function(char[][])*).runAll().__lambda2() (in /usr/lib64/libphobos2.so.0.99.1) ==3369== by 0x4D5CD6D: void rt.dmain2._d_run_main2(char[][], ulong, extern (C) int function(char[][])*).tryExec(scope void delegate()) (in /usr/lib64/libphobos2.so.0.99.1) ==3369== by 0x4D5CE46: void rt.dmain2._d_run_main2(char[][], ulong, extern (C) int function(char[][])*).runAll() (in /usr/lib64/libphobos2.so.0.99.1) ==3369== by 0x4D5CD6D: void rt.dmain2._d_run_main2(char[][], ulong, extern (C) int function(char[][])*).tryExec(scope void delegate()) (in /usr/lib64/libphobos2.so.0.99.1) ==3369== by 0x4D5CCD6: _d_run_main2 (in /usr/lib64/libphobos2.so.0.99.1) ==3369== by 0x4D5CA9F: _d_run_main (in /usr/lib64/libphobos2.so.0.99.1) ==3369== by 0x10923F: main (in /tmp/test/test) ==3369== Address 0x10c098 is 56 bytes inside a MARKED BLOCK of size 76 client-defined ==3369== at 0x1095A1: vg_mark_block (in /tmp/test/test) ==3369== by 0x1091B3: _Dmain (in /tmp/test/test) ``` Unfortunately valgrind reports both read and write accesses to this area in the log, so the noise about "invalid reads" needs to be filtered out. It doesn't support marking an address range as read-only out of the box: https://valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs (but maybe this can be improved?). ASAN instrumentation could be also potentially useful in the future for catching write accesses to the immutable data backed by GC allocations. And I'm pleasantly surprised to see that ASAN is [already available in LDC](http://johanengelen.github.io/ldc/2017/12/25/LDC-and-Add essSanitizer.html). However just like valgrind, right now ASAN doesn't support poisoning a memory area as read-only: https://www.mail-archive.com/address-sanitizer googlegroups.com/msg01948.html
Dec 19 2022