www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - AA vs __gshared

reply IchorDev <zxinsworld gmail.com> writes:
I've been getting a lot of segfaults from using associative 
arrays recently. The faults happen seemingly at random, and from 
pretty mundane stuff like `if(auto x = y in z)` that run very 
often:
```
Segmentation fault.

const(void*), scope const(TypeInfo)) inout ()

```

I suspect that this is because they've all been placed inside 
`__ghared` structs. Are DRuntime's AAs simply incompatible with 
`__gshared`? Do they need to be marked as `shared` to prevent 
these shenanigans?
Jul 27 2023
next sibling parent Dennis <dkorpel gmail.com> writes:
On Thursday, 27 July 2023 at 15:57:51 UTC, IchorDev wrote:
 The faults happen seemingly at random, and from pretty mundane 
 stuff like `if(auto x = y in z)` that run very often:
Are you accessing the AA from multiple threads?
Jul 27 2023
prev sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, July 27, 2023 9:57:51 AM MDT IchorDev via Digitalmars-d-learn 
wrote:
 I've been getting a lot of segfaults from using associative
 arrays recently. The faults happen seemingly at random, and from
 pretty mundane stuff like `if(auto x = y in z)` that run very
 often:
 ```
 Segmentation fault.

 const(void*), scope const(TypeInfo)) inout ()

 ```

 I suspect that this is because they've all been placed inside
 `__ghared` structs. Are DRuntime's AAs simply incompatible with
 `__gshared`? Do they need to be marked as `shared` to prevent
 these shenanigans?
Strictly speaking, __gshared is really only intended for stuff like C globals (which can't be shared due to name-mangling issues). Using it on anything else can at least potentially cause problems due to the fact that the compiler will assume that the variable is thread-local. So, I would strongly advise against using __gshared in a case like this. In practice, you can often get away with it, because the compiler doesn't do much in the way of optimizing stuff based on objects being thread-local right now, but it's definitely risking problems with the type system if you used __gshared when you're not trying to do something like bind to a C global. What should normally be happening is that you use shared, and then when you've protected the object so that you know that it can only be accessed on the current thread by the section of code that you're in (e.g. by locking a mutex), you temporarily cast away shared to operate on the object via a thread-local reference. Then, before exiting that section of code and removing the protections that are preventing other threads from accessing the object (e.g. by unlocking the mutex), you make sure that you've gotten rid of all of the thread-local references to the object so that only the shared reference exists. That way, you don't accidentally mutate the object while it's not protected from access by other threads. Now, as to what's happening in your code that's causing segfaults, the most likely culprit would be that you're accessing the AA without actually having done anything to prevent other threads from accessing it at the same time (or your protections were inadequate). And because the object is being treated as thread-local by the compiler, it would be easy to have accidentally let a reference to it leak somewhere that wasn't being protected by whatever mutex you're using, whereas if the AA were shared, the only sections of code where you would have to worry about thread-local references escaping would be in the sections of code where you've cast away shared after locking the relevant mutex. So, similar to what happens with safe and trusted, using shared allows you to limit the code that you have to examine to find the problem. - Jonathan M Davis
Jul 27 2023
next sibling parent reply IchorDev <zxinsworld gmail.com> writes:
On Thursday, 27 July 2023 at 21:31:02 UTC, Jonathan M Davis wrote:
 Now, as to what's happening in your code that's causing 
 segfaults, the most likely culprit would be that you're 
 accessing the AA without actually having done anything to 
 prevent other threads from accessing it at the same time (or 
 your protections were inadequate).
I use a shared Mutex from `core.sync.mutex`. The AA itself and the `final class` that it's a member of is only ever accessed by one thread for now. The code inside that `final class` that accesses the AA is called by a function from another `final class`, and that other `final class` is used by multiple threads, but it's fully guarded by Mutex locks—I haven't had any issues with the millions of non-atomic reads and writes on its data I've performed from the two threads. Both objects are "static" (declared at module scope) if that matters at all.
 if the AA were shared, the only sections of code where you 
 would have to worry about thread-local references escaping 
 would be in the sections of code where you've cast away shared 
 after locking the relevant mutex. So, similar to what happens 
 with  safe and  trusted, using shared allows you to limit the 
 code that you have to examine to find the problem.
I was told that using `__gshared` is quite a bit faster at runtime than using `shared`, but I also don't really know anything concrete about `shared` because the spec is so incredibly vague about it.
Jul 27 2023
parent reply Kagamin <spam here.lot> writes:
On Friday, 28 July 2023 at 03:54:53 UTC, IchorDev wrote:
 I was told that using `__gshared` is quite a bit faster at 
 runtime than using `shared`, but I also don't really know 
 anything concrete about `shared` because the spec is so 
 incredibly vague about it.
The difference between them is purely formal if you're not on an crashes are frequent, can you reproduce a crash with a minimal amount of code, start many threads and access the locked AA concurrently.
Jul 27 2023
parent IchorDev <zxinsworld gmail.com> writes:
On Friday, 28 July 2023 at 04:13:13 UTC, Kagamin wrote:
 The difference between them is purely formal if you're not on 

I'm not sure that's correct. Also I always use the latest compiler versions where possible.
 start many threads and access the locked AA concurrently.
It's only being accessed from one thread, so such a test wouldn't be an accurate reproduction of the issue I'm having. Also I'm only using 2 threads total.
Jul 27 2023
prev sibling parent reply IchorDev <zxinsworld gmail.com> writes:
On Thursday, 27 July 2023 at 21:31:02 UTC, Jonathan M Davis wrote:
 What should normally be happening is that you use shared, and 
 then when you've protected the object so that you know that it 
 can only be accessed on the current thread by the section of 
 code that you're in (e.g. by locking a mutex), you temporarily 
 cast away shared to operate on the object via a thread-local 
 reference. Then, before exiting that section of code and 
 removing the protections that are preventing other threads from 
 accessing the object (e.g. by unlocking the mutex), you make 
 sure that you've gotten rid of all of the thread-local 
 references to the object so that only the shared reference 
 exists. That way, you don't accidentally mutate the object 
 while it's not protected from access by other threads.
Doing this doesn't help, unfortunately. This is an abstract version what the problematic code looks like now: ```d import std.stdio: writeln; import core.sync.mutex; import core.thread; import core.time; static import core.stdc.stdlib; struct Pos{ long x,y; } shared Obj obj; final class Obj{ CacheData[Pos] cache; CacheData* getCache(Pos key){ if(auto item = key in cache){ if(++item.useCount >= 8){ cache.remove(key); //I thought this ^ might cause a use-after-free issue, but apparently not. } return item; } return null; } struct CacheData{ double[1000] data; uint useCount = 1; } double[1000] doStuff(Pos pos){ immutable data = (){ if(auto data = getCache(pos)){ return *data; }else{ double[1000] data; //initialisses it with something, this is arbirray though: foreach(ref item; data){ import std.random; item = uniform01(); } cache[pos] = CacheData(data); return CacheData(data); } }(); //do stuff with data return data.data; } } __gshared OtherObj otherObj; final class OtherObj{ shared Mutex mutex; __gshared ubyte[2^^18] data; this(long n){ obj = cast(shared Obj)alloc!Obj(n); mutex = new shared Mutex; } void callObj(Pos pos){ double[1000] data; { auto objRef = cast(Obj)obj; data = objRef.doStuff(pos); } //do things with returned value... } } void thread(){ bool run = true; Pos pos; while(run){ otherObj.mutex.lock(); foreach(i; 0..100){ otherObj.callObj(pos); //this is pretty arbirary: import std.random; if(uniform01() > 0.9){ auto v = uniform01(); if(v < 0.25) pos.x--; else if(v < 0.5) pos.y++; else if(v < 0.75) pos.y--; else pos.x++; } } otherObj.mutex.unlock(); Thread.sleep(1.seconds / 20); } } void main(){ otherObj = alloc!OtherObj(-7); assert(otherObj !is null); auto otherThread = new Thread(&thread); otherThread.start(); bool run = true; while(run){ if(!otherThread.isRunning()) otherThread.join(); otherObj.mutex.lock(); foreach(val; otherObj.data){ //do stuff } otherObj.mutex.unlock(); Thread.sleep(1.seconds / 80); } } T alloc(T, A...)(auto ref A args){ enum classSize = __traits(classInstanceSize, T); void* mem = core.stdc.stdlib.malloc(classSize); scope(failure) core.stdc.stdlib.free(mem); assert(mem !is null, "Out of memory"); mem[0..classSize] = __traits(initSymbol, T); T inst = cast(T)mem; static if(__traits(hasMember, T, "__ctor")){ inst.__ctor(__traits(parameters)); } return inst; } ``` Issue is, this code I posted actually runs just fine, unlike the real code. My actual code does this HIGHLY SUSPICIOUS thing when printing their length each time before using them: ``` 766 766 765 766 767 768 768 768 768 768 768 768 768 768 768 768 768 768 768 128000 Error Program exited with code -11 (backtrace:) const(void*), scope const(TypeInfo)) inout () ``` Therefore I must logically conclude that DRuntime's AAs are cursed! Unless this is a well-known issue or something... Thinking back, I've actually had them cause segfaults in non-threaded code, maybe `__gshared` was never the cause at all.
Jul 28 2023
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 7/28/23 4:39 AM, IchorDev wrote:

 Issue is, this code I posted actually runs just fine, unlike the real code.
 My actual code does this HIGHLY SUSPICIOUS thing when printing their 
 length each time before using them:
 ```
 766
 766
 765
 766
 767
 768
 768
 768
 768
 768
 768
 768
 768
 768
 768
 768
 768
 768
 768
 128000
 Error Program exited with code -11
 (backtrace:)

 const(void*), scope const(TypeInfo)) inout ()

 ```
My suspicion would be a race condition in your real code, and no race condition in this toy code. Second suspicion would be memory corruption (possibly caused by a race condition).
 Therefore I must logically conclude that DRuntime's AAs are cursed! 
 Unless this is a well-known issue or something...
AAs have worked forever. I've never had problems with them. Not saying there's not a bug here, maybe there is. But I would be surprised.
 Thinking back, I've actually had them cause segfaults in non-threaded 
 code, maybe `__gshared` was never the cause at all.
All `__gshared` does is give you storage that is accessible from all threads, but is not typed as `shared`. It doesn't change how the data is used. -Steve
Jul 28 2023
parent reply IchorDev <zxinsworld gmail.com> writes:
On Friday, 28 July 2023 at 11:15:31 UTC, Steven Schveighoffer 
wrote:
 All `__gshared` does is give you storage that is accessible 
 from all threads,
"All __gshared does is give you [a live bomb, ready to go off at any moment]" !! On Friday, 28 July 2023 at 14:10:16 UTC, Kagamin wrote:
 Your error is using allocating the object with malloc. Since gc 
 doesn't see your AA, the AA is freed and you get UAF.
Friend, I think you nailed it. After adding this I haven't been able to reproduce the segfault again: ```d this(long n){ (){ cache = new typeof(cache); assert(cache); import core.memory; core.memory.GC.addRoot(cast(void*)cache); }(); // ... ``` It did always happen at random, so perhaps I haven't spent enough time testing it yet, but I've gone far longer without it segfaulting than ever before.
Jul 28 2023
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 7/28/23 11:15 AM, IchorDev wrote:
 On Friday, 28 July 2023 at 11:15:31 UTC, Steven Schveighoffer wrote:
 All `__gshared` does is give you storage that is accessible from all 
 threads,
"All __gshared does is give you [a live bomb, ready to go off at any moment]" !!
It seems like it's not __gshared at all, but misusing malloc with GC pointers.
 
 On Friday, 28 July 2023 at 14:10:16 UTC, Kagamin wrote:
 Your error is using allocating the object with malloc. Since gc 
 doesn't see your AA, the AA is freed and you get UAF.
Friend, I think you nailed it. After adding this I haven't been able to reproduce the segfault again: ```d this(long n){     (){         cache = new typeof(cache);         assert(cache);         import core.memory;         core.memory.GC.addRoot(cast(void*)cache);     }();     // ... ``` It did always happen at random, so perhaps I haven't spent enough time testing it yet, but I've gone far longer without it segfaulting than ever before.
This is the wrong approach, it's the allocating call that should add the root (actually a range). For instance, the mutex is not added, that might be collected. Or if you add more GC-pointing things into the class, that could be a problem. What I'd do is: ```d T alloc(T, A...)(auto ref A args){ enum classSize = __traits(classInstanceSize, T); void* mem = core.stdc.stdlib.malloc(classSize); assert(mem !is null, "Out of memory"); core.memory.GC.addRange(mem[0 .. classSize]); scope(failure) { core.memory.GC.removeRange(mem[0 .. classSize]); core.stdc.stdlib.free(mem); } T inst = cast(T)mem; inst.emplace(__traits(parameters)); return inst; } ``` And of course, a `dealloc` that removes the range should also be added. -Steve
Jul 28 2023
parent reply IchorDev <zxinsworld gmail.com> writes:
On Friday, 28 July 2023 at 21:22:01 UTC, Steven Schveighoffer 
wrote:
 ```d
 T alloc(T, A...)(auto ref A args){
     enum classSize = __traits(classInstanceSize, T);
     void* mem = core.stdc.stdlib.malloc(classSize);
     assert(mem !is null, "Out of memory");
     core.memory.GC.addRange(mem[0 .. classSize]);
     scope(failure) {
        core.memory.GC.removeRange(mem[0 .. classSize]);
        core.stdc.stdlib.free(mem);
     }
     T inst = cast(T)mem;
     inst.emplace(__traits(parameters));
     return inst;
 }
 ```

 And of course, a `dealloc` that removes the range should also 
 be added.

 -Steve
This also works *unless* I do a `foreach` over the AA's keys & values (`foreach(k,v; aa)`), in which case it still segfaults at random. Deconstructing the foreach into a for works though, for whatever reason.
Aug 09 2023
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/10/23 1:59 AM, IchorDev wrote:

 
 This also works *unless* I do a `foreach` over the AA's keys & values 
 (`foreach(k,v; aa)`), in which case it still segfaults at random. 
 Deconstructing the foreach into a for works though, for whatever reason.
That shouldn't matter. The key thing is, if you are going to use malloc, you need to register the memory with the GC. The GC doesn't scan memory it doesn't know about. But my main point was the class itself shouldn't be registering itself -- it did not make the decision to malloc. You should register where you allocate. I also highly recommend using `emplace` to handle all the sticky issues with lifetime/construction. -Steve
Aug 10 2023
parent reply IchorDev <zxinsworld gmail.com> writes:
On Thursday, 10 August 2023 at 15:20:28 UTC, Steven Schveighoffer 
wrote:
 That shouldn't matter.
Well, it does here. The AA is mutated during the loop, so perhaps this is an optimisation quirk where it works with `for` but segfaults in `foreach`? I've pretty thoroughly abused the `for` version and I haven't gotten it to segfault yet.
 I also highly recommend using `emplace` to handle all the 
 sticky issues with lifetime/construction.
Have not run into the aforementioned sticky issues yet, but I can't even find `emplace`'s docs anywhere now. I recall it being incompatible with classes that have nogc/nothrow constructors though, which made it pretty useless to me, and it wouldn't work with BetterC, which was a requirement for the allocation wrapper I was writing at the time.
Aug 12 2023
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/12/23 5:55 AM, IchorDev wrote:
 On Thursday, 10 August 2023 at 15:20:28 UTC, Steven Schveighoffer wrote:
 That shouldn't matter.
Well, it does here. The AA is mutated during the loop, so perhaps this is an optimisation quirk where it works with `for` but segfaults in `foreach`? I've pretty thoroughly abused the `for` version and I haven't gotten it to segfault yet.
oh yeah. That is not allowed. Any mutation of the AA during iteration can invalidate existing foreach or ranges over the AA. Basically, a rehash can cause the buckets to jumble up, and in that case, the "current index" can be changed to point at a null bucket. More here: https://dlang.org/spec/statement.html#foreach_restrictions In fact, that statement is way too broad. Invalidation of iteration should be based on the type's requirements. We really should put a note in the AA spec page.
 I also highly recommend using `emplace` to handle all the sticky 
 issues with lifetime/construction.
Have not run into the aforementioned sticky issues yet, but I can't even find `emplace`'s docs anywhere now.
https://dlang.org/phobos/core_lifetime.html#.emplace
 I recall it being incompatible with 
 classes that have  nogc/nothrow constructors though, which made it 
 pretty useless to me, and it wouldn't work with BetterC, which was a 
 requirement for the allocation wrapper I was writing at the time.
It probably won't work with betterC, but that's probably just because of linker errors. Any attribute requirements would be inferred based on the attributes of your constructor, because emplace is a template. -Steve
Aug 15 2023
parent IchorDev <zxinsworld gmail.com> writes:
On Tuesday, 15 August 2023 at 17:36:13 UTC, Steven Schveighoffer 
wrote:
 On 8/12/23 5:55 AM, IchorDev wrote:
 On Thursday, 10 August 2023 at 15:20:28 UTC, Steven 
 Schveighoffer wrote:
 That shouldn't matter.
Well, it does here. The AA is mutated during the loop, so perhaps this is an optimisation quirk where it works with `for` but segfaults in `foreach`? I've pretty thoroughly abused the `for` version and I haven't gotten it to segfault yet.
oh yeah. That is not allowed. Any mutation of the AA during iteration can invalidate existing foreach or ranges over the AA.
And yet this apparently doesn’t apply to an equivalent `for` somehow. Perhaps `foreach` shouldn’t have this arbitrary problem in the first place…
 In fact, that statement is way too broad. Invalidation of 
 iteration should be based on the type's requirements.

 We really should put a note in the AA spec page.
Probably yeah.
 https://dlang.org/phobos/core_lifetime.html#.emplace

 Any attribute requirements would be inferred based on the 
 attributes of your constructor, because emplace is a template.

 -Steve
Well the docs don’t say that at all, and the code is an unreadable mess. I dunno how anymore is meant to figure that out?
Aug 15 2023
prev sibling parent Kagamin <spam here.lot> writes:
Your error is using allocating the object with malloc. Since gc 
doesn't see your AA, the AA is freed and you get UAF.
Jul 28 2023