www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Program crash: GC destroys an object unexpectedly

reply eugene <dee0xeed gmail.com> writes:
Code snippets

```d
class Stopper : StageMachine {

     enum ulong M0_IDLE = 0;
     Signal sg0, sg1;

     this() {
         super("STOPPER");

         Stage init, idle;
         init = addStage("INIT", &stopperInitEnter);
         idle = addStage("IDLE", &stopperIdleEnter);

         init.addReflex("M0", idle);

         idle.addReflex("S0", &stopperIdleS0);
         idle.addReflex("S1", &stopperIdleS1);
     }

     void stopperInitEnter() {
         sg0 = newSignal(Signal.sigInt);
         sg1 = newSignal(Signal.sigTerm);
         msgTo(this, M0_IDLE);
     }
```

The instance of Stopper is created in the scope of main():

```d
void main(string[] args) {

     auto stopper = new Stopper();
     stopper.run();
```

stopperInitEnter(), where sg0 and sg1 are created, is invoked 
inside run() method.

After ~6 seconds from the start (dummy) destructors of sg0 and 
sg1 are called:

    !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd 
= 24) this   0x7fa5410d4f60
    !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd 
= 25) this   0x7fa5410d4f90

Then after pressing ^C (SIGINT) the program gets SIGSEGV, since 
references to sg0 and sg1 are no longer valid (they are "sitting" 
in epoll_event structure).

First I thought I am stupid and I do not see some obvious 
mistake, but...
That crash happens if the program was compiled with dmd 
(v2.097.2).
When using gdc (as well as ldc, both from Debian 8 official 
repo), I do not observe no crashes - program may run for hours 
and after interrupting by ^C it terminates as expected.

And the most strange thing is this - if using gdc with -Os flag, 
the program behaves
exactly as when compiled with fresh dmd - destructors for sg0 and 
sg1 are called soon
after program start.

I do not understand at all why GC considers those sg0 and sg1 as 
unreferenced.
And why old gdc (without -Os) and old ldc do not.
Sep 13 2021
next sibling parent reply user1234 <user1234 12.de> writes:
On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:
 [...]
At first glance and given the fact that some code necessary to diagnose the problem accurately is missing: `new Stopper` allocates using the GC. it's then a "GC range", it's content will be scanned and handled by the GC, including `sg0` and `sg1`. So far everything is simple. The problems seems to lies in `newSignal()` which "would" not allocate using the GC. So when the GC reaches `sg0` and `sg1` values, indirectly when scanning a `Stopper` instance, it thinks that these they are unused and, consequently, free them. If you dont want them to be managed by the GC remove them from the GC, using `removeRange()`.
Sep 13 2021
next sibling parent user1234 <user1234 12.de> writes:
On Monday, 13 September 2021 at 17:40:41 UTC, user1234 wrote:
 On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:
 [...]
At first glance and given the fact that some code necessary to diagnose the problem accurately is missing: `new Stopper` allocates using the GC. it's then a "GC range", it's content will be scanned and handled by the GC, including `sg0` and `sg1`. So far everything is simple. The problems seems to lies in `newSignal()` which "would" not allocate using the GC. So when the GC reaches `sg0` and `sg1` values, indirectly when scanning a `Stopper` instance, it thinks that these they are unused and, consequently, free them. If you dont want them to be managed by the GC remove them from the GC, using `removeRange()`.
of sorry, or maybe the opposite, so addRange...
Sep 13 2021
prev sibling parent reply eugene <dee0xeed gmail.com> writes:
On Monday, 13 September 2021 at 17:40:41 UTC, user1234 wrote:
 The problems seems to lies in `newSignal()` which "would" not 
 allocate using the GC.
final Signal newSignal(int signum) { Signal sg = new Signal(signum); sg.owner = this; sg.number = sg_number++; sg.register(); return sg; } full src is here http://zed.karelia.ru/0/e/edsm-in-d-2021-09-10.tar.gz
Sep 13 2021
next sibling parent reply user1234 <user1234 12.de> writes:
On Monday, 13 September 2021 at 17:54:43 UTC, eugene wrote:
 On Monday, 13 September 2021 at 17:40:41 UTC, user1234 wrote:
 The problems seems to lies in `newSignal()` which "would" not 
 allocate using the GC.
final Signal newSignal(int signum) { Signal sg = new Signal(signum); sg.owner = this; sg.number = sg_number++; sg.register(); return sg; } full src is here http://zed.karelia.ru/0/e/edsm-in-d-2021-09-10.tar.gz
thx, so the problem is not what I suspected to be (mixed gc-managed and manually managed memory). sorrry...
Sep 13 2021
parent eugene <dee0xeed gmail.com> writes:
On Monday, 13 September 2021 at 17:56:34 UTC, user1234 wrote:
 thx, so the problem is not what I suspected to be (mixed 
 gc-managed and manually managed memory). sorrry...
I am actually C coder and do not have much experience with GC languages, so I did not even attempt to try use D without GC yet, just want to understand how all that GC magic works. The programs does not contain manual malloc()/free(), I am just not ready for such mix.
Sep 13 2021
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/13/21 1:54 PM, eugene wrote:
 On Monday, 13 September 2021 at 17:40:41 UTC, user1234 wrote:
 The problems seems to lies in `newSignal()` which "would" not allocate 
 using the GC.
    final Signal newSignal(int signum) {         Signal sg = new Signal(signum);         sg.owner = this;         sg.number = sg_number++;         sg.register();         return sg;     } full src is here http://zed.karelia.ru/0/e/edsm-in-d-2021-09-10.tar.gz
The GC only scans things that it knows about. Inside your EventQueue you have this code: ```d void registerEventSource(EventSource es) { auto e = EpollEvent(0, es); int r = epoll_ctl(id, EPOLL_CTL_ADD, es.id, &e); assert(r == 0, "epoll_ctl(ADD) failed"); } EventQueue opOpAssign(string op)(EventSource es) if (("+" == op) || ("~" == op)) { registerEventSource(es); return this; } void deregisterEventSource(EventSource es) { auto e = EpollEvent(0, es); int r = epoll_ctl(id, EPOLL_CTL_DEL, es.id, &e); assert(r == 0, "epoll_ctl(DEL) failed"); } EventQueue opOpAssign(string op)(EventSource es) if ("-" == op) { deregisterEventSource(es); return this; } ``` And you are registering your signals using the `+=` operator. What is happening here, is, `epoll_ctl` is adding your event source to a *C allocated* structure (namely the epoll struct, allocated by `epoll_create1`, and possibly even managed by the OS). The GC does not have access to this struct, so if that's the only reference to them, they will get cleaned up by the GC. Now, with your stopper code that you showed, it looks like you are storing the reference to stopper right on the main stack frame. This *should* prevent those from being destroyed, since Stopper has a reference to both signals. But I would recommend using `core.memory.GC.addRoot` on your EventSource when registering it with epoll, and using `core.memory.GC.removeRoot` when unregistering. That will ensure they do not get cleaned up before being unregistered. If this doesn't fix the problem, perhaps there is some other issue happening. -Steve
Sep 13 2021
next sibling parent eugene <dee0xeed gmail.com> writes:
On Monday, 13 September 2021 at 18:42:47 UTC, Steven 
Schveighoffer wrote:
 And you are registering your signals using the `+=` operator.
That was a sort of exercise with operator overloading.
 Now, with your stopper code that you showed, it looks like you 
 are storing the reference to stopper right on the main stack 
 frame. This *should* prevent those from being destroyed, since 
 Stopper has a reference to both signals.
Exactly - this is the main point of my confusion. On my expectation, GC should not mark those as unreferenced. Also, notice those dynamic arrays void main(string[] args) { RxSm[] rxMachines; auto rxPool = new RestRoom(); foreach (k; 0 .. nConnections) { auto sm = new RxSm(rxPool); rxMachines ~= sm; sm.run(); } rxMachines (and alike) are not needed by the prog itself, they are just to keep references for GC.
Sep 13 2021
prev sibling parent reply Tejas <notrealemail gmail.com> writes:
On Monday, 13 September 2021 at 18:42:47 UTC, Steven 
Schveighoffer wrote:
 On 9/13/21 1:54 PM, eugene wrote:
 [...]
The GC only scans things that it knows about. Inside your EventQueue you have this code: [...]
Umm is it okay that he declared variables `init` and `idle` of type `Stage` inside the constructor? Maybe that has something to do with this? Also, calling a variable `init` could be problematic since the compiler assigns a property of the same name to every single type?
Sep 13 2021
next sibling parent eugene <dee0xeed gmail.com> writes:
On Tuesday, 14 September 2021 at 05:49:58 UTC, Tejas wrote:
 Umm is it okay that he declared variables `init` and `idle` of 
 type `Stage` inside the constructor?
States of a machine are in associative array. All other machines create their states in constructor, local variables are for using addReflex() method. But this stopper machine is 'special' for GC somehow.
Sep 14 2021
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/14/21 1:49 AM, Tejas wrote:
 On Monday, 13 September 2021 at 18:42:47 UTC, Steven Schveighoffer wrote:
 On 9/13/21 1:54 PM, eugene wrote:
 [...]
The GC only scans things that it knows about. Inside your EventQueue you have this code: [...]
Umm is it okay that he declared variables `init` and `idle` of type `Stage` inside the constructor? Maybe that has something to do with this? Also, calling a variable `init` could be problematic since the compiler assigns a property of the same name to every single type?
Declaring a member/field named `init` is likely a bad idea, but this is not a member, it's just a variable. That's fine. `idle` doesn't mean anything special to D. This project is too big and complex for me to diagnose by just reading, it would take some effort, and I don't have the time, sorry. Though as I have learned helping C converts before, most of the time things like this have to do with forgetting to store a GC reference somewhere. It can be subtle too... I still recommend pinning the object when adding the epoll event and seeing if that helps. -Steve
Sep 14 2021
next sibling parent reply eugene <dee0xeed gmail.com> writes:
On Tuesday, 14 September 2021 at 12:09:03 UTC, Steven 
Schveighoffer wrote:
 Though as I have learned helping C converts before, most of the 
 time things like this have to do with forgetting to store a GC 
 reference somewhere.
Yeah, in my first version I had ```d foreach (k; 0 .. nConnections) { auto sm = new EchoClient(rxPool, txPool); sm.run(); } ``` instead of ```d EchoClient[] wrkMachines; foreach (k; 0 .. nConnections) { auto sm = new EchoClient(rxPool, txPool); wrkMachines ~= sm; sm.run(); } ``` and even ```d { auto stopper = new Stopper(); stopper.run(); } ``` :)
 I still recommend pinning the object when adding the epoll 
 event and seeing if that helps.
I understand your idea, but even if this will help, the question remains - why that particular object is so special for GC.
Sep 14 2021
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/14/21 8:42 AM, eugene wrote:
 On Tuesday, 14 September 2021 at 12:09:03 UTC, Steven Schveighoffer wrote:
 I still recommend pinning the object when adding the epoll event and 
 seeing if that helps.
I understand your idea, but even if this will help, the question remains - why that particular object is so special for GC.
Philosophically, it places the responsibility of making sure the object is valid while using it on the thing that chooses to store it outside the GC's view. Looking at your examples, you are having to store these object references elsewhere, surrounding seemingly innocuous and normal D usage of objects. You have put the burden on the caller to make sure the implementation details are sound. But I agree that a superficial reading of your code seems like it ought to not be collected, and that problem is also worth figuring out. I have high confidence that it's probably not a design flaw in the GC, but rather some misunderstanding of GC-allocated lifetimes in your code. But that doesn't mean it's not actually a bug somewhere in D. -Steve
Sep 14 2021
parent eugene <dee0xeed gmail.com> writes:
On Tuesday, 14 September 2021 at 12:52:44 UTC, Steven 
Schveighoffer wrote:
 But I agree that a superficial reading of your code seems like 
 it ought to not be collected, and that problem is also worth 
 figuring out. I have high confidence that it's probably not a 
 design flaw in the GC, but rather some misunderstanding of 
 GC-allocated lifetimes in your code. But that doesn't mean it's 
 not actually a bug somewhere in D.
run the server (do not run client): 'LISTENER INIT' got 'M0' from 'SELF' 'LISTENER' registered 104 (esrc.TCPListener) 'LISTENER' enabled 104 (esrc.TCPListener) 'LISTENER' enabled 105 (esrc.Signal) 'LISTENER' enabled 106 (esrc.Signal) wait > 6 seconds press ^C observe ___!!!___edsm.StageMachine.~this(): WORKER-95 destroyed... ___!!!___edsm.StageMachine.~this(): WORKER-96 destroyed... ___!!!___edsm.StageMachine.~this(): LISTENER destroyed... run client (do not run the server) observe 'CLIENT-9 CONN' got 'M2' from 'TX-1' CLIENT-9:client.EchoClient.clientConnM2() : connection to 'localhost:1111' failed error111) CLIENT-9:client.EchoClient.clientConnM2() : connection to 'localhost:1111' failed(Connection refused) press ^C observe ___!!!___edsm.StageMachine.~this(): STOPPER destroyed... run server again run client like this: ./echo-client | grep owner wait >6.seconds see !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 24) this 0x7fa6cf12cf60 !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 25) this 0x7fa6cf12cf90 WHY this is not happening with echo-server???
Sep 14 2021
prev sibling parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Tuesday, 14 September 2021 at 12:42:51 UTC, eugene wrote:
 I understand your idea, but even if this will help, the question
 remains - why that particular object is so special for GC.
I had a problem just like this before because I was sending objects through the pipe. And while they were in the pipe - after send but before receive on the other side - it was liable to be collected. idk your code though if you have a separate reference to the object you should be ok. but here's the comment from my code when i broke it https://github.com/adamdruppe/arsd/blob/master/eventloop.d#L180
Sep 14 2021
next sibling parent eugene <dee0xeed gmail.com> writes:
On Tuesday, 14 September 2021 at 12:53:27 UTC, Adam D Ruppe wrote:
 I had a problem just like this before because I was sending 
 objects through the pipe.
This reminds my (not very successfull) attempts to implement the idea in Rust: ```rust pub struct Edsm { name: String, pub states: Vec<State>, current: usize, // pub state : *mut State, (?) pub data: *const void, // long live void* !!! // pub buddy : &'a Edsm, // ... and a hell begins... mb: Option<Box<EventSource>>, /* self-pipe write end fd, for sending internal events to this machine */ mxfd: i32, io: Option<Box<EventSource>>, pub tm: Vec<EventSource>, sg: Vec<EventSource>, // pub fs : Option<Box<EventSource>>, // pub ecap : &'a mut Ecap, // Welcome to <'x> HELL again!!! ecap: *mut Ecap, running: bool, /* self.run() has been invoked */ } ``` When something (a struct, for ex.) goes to a queue (DList for ex.), it is out of ANY scope and clever things like borrow checker can not analyze it's lifetime, oops...
Sep 14 2021
prev sibling parent eugene <dee0xeed gmail.com> writes:
On Tuesday, 14 September 2021 at 12:53:27 UTC, Adam D Ruppe wrote:
 I had a problem just like this before because I was sending 
 objects through the pipe. And while they were in the pipe -
```rust pub fn msg(&self, code: u32) { let ptr: *const u32 = &code; let n = unsafe { write(self.mxfd, ptr as *const void, 4) }; if -1 == n { panic!("write({}): {:?}", self.mxfd, Error::last_os_error()); } } ``` I failed to implement message queue as a wrapper over double list, rust borrow checker has beaten me :)
Sep 14 2021
prev sibling next sibling parent reply eugene <dee0xeed gmail.com> writes:
On Tuesday, 14 September 2021 at 12:09:03 UTC, Steven 
Schveighoffer wrote:
 This project is too big and complex
Really, "too big and complex"? It's as simple as a tabouret :) It's just a toy/hobby 'project'.
Sep 14 2021
parent reply jfondren <julian.fondren gmail.com> writes:
On Tuesday, 14 September 2021 at 14:40:55 UTC, eugene wrote:
 On Tuesday, 14 September 2021 at 12:09:03 UTC, Steven 
 Schveighoffer wrote:
 This project is too big and complex
Really, "too big and complex"? It's as simple as a tabouret :) It's just a toy/hobby 'project'.
A 5-pound phone isn't "too heavy" for an adult to carry but it won't sell well. It's not just about capabilities but what efforts people are willing to expend. I would troubleshoot your issue by gradually making it safe and thinking about exceptions. One exception I didn't think about earlier was the 'misaligned pointer' one that I said I suppressed just to find the next safe complaint: https://dlang.org/spec/garbage.html says:
Do not misalign pointers if those pointers may point into the GC 
heap,
So even if the lifetimes of your EventSource structs are fixed, the GC can reap the object they're pointing to. You could fix this by having a 128-bit struct and passing C an index into it, so to speak.
Sep 14 2021
next sibling parent reply eugene <dee0xeed gmail.com> writes:
On Tuesday, 14 September 2021 at 14:56:00 UTC, jfondren wrote:
 You could fix this by having a 128-bit struct and passing C an 
 index into it
It is another "not so funny joke", isn't it? Look ```c typedef union epoll_data { void *ptr; int fd; uint32_t u32; uint64_t u64; } epoll_data_t; struct epoll_event { uint32_t events; /* Epoll events */ epoll_data_t data; /* User data variable */ } __EPOLL_PACKED; // inside the system struct epoll_event { __u32 events; __u64 data; } EPOLL_PACKED; ``` and notice ```d align (1) struct EpollEvent { align(1): uint event_mask; EventSource es; /* just do not want to use that union, epoll_data_t */ } static assert(EpollEvent.sizeof == 12); ```
Sep 14 2021
parent reply jfondren <julian.fondren gmail.com> writes:
On Tuesday, 14 September 2021 at 15:37:27 UTC, eugene wrote:
 On Tuesday, 14 September 2021 at 14:56:00 UTC, jfondren wrote:
 You could fix this by having a 128-bit struct and passing C an 
 index into it
It is another "not so funny joke", isn't it?
No. And when was the first one?
 ```d
 align (1) struct EpollEvent {
     align(1):
     uint event_mask;
     EventSource es;
     /* just do not want to use that union, epoll_data_t */
 }
 static assert(EpollEvent.sizeof == 12);
 ```
That's 96 bits. Add 32. ```d class EventSource { } align(1) struct EpollEvent { align(1): uint event_mask; EventSource es; } struct OuterEpollEvent { int _dummy; uint event_mask; EventSource es; } EpollEvent* epollEvent(return ref OuterEpollEvent ev) trusted { return cast(EpollEvent*) &ev.event_mask; } void dumpEpollEvent(EpollEvent* ev) trusted { import std.stdio : writeln; writeln(*ev); } unittest { // can't be safe: // Error: field `EpollEvent.es` cannot modify misaligned pointers in ` safe` code EpollEvent ev; ev.es = new EventSource; // misaligned } safe unittest { // this is fine OuterEpollEvent ev; ev.event_mask = 0; ev.es = new EventSource; // not misaligned ev.epollEvent.dumpEpollEvent; } ```
Sep 14 2021
parent reply eugene <dee0xeed gmail.com> writes:
On Tuesday, 14 September 2021 at 16:07:00 UTC, jfondren wrote:
 No. And when was the first one?
here: On Monday, 13 September 2021 at 18:45:22 UTC, jfondren wrote:
  auto p = cast(EpollEvent*) pureMalloc(EpollEvent.sizeof);
What? Allocate struct epoll_event on the heap? It is a feeble joke ;) ```c static int ecap__add(int fd, void *dptr) { struct epoll_event waitfor = {0}; int flags, r; waitfor.data.ptr = dptr; r = epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &waitfor); if (-1 == r) { ``` All fd's (sockets, timers etc) are added the same way and corresponding EventSources are not destroyed by GC.
Sep 14 2021
parent reply jfondren <julian.fondren gmail.com> writes:
On Tuesday, 14 September 2021 at 16:15:20 UTC, eugene wrote:
 On Tuesday, 14 September 2021 at 16:07:00 UTC, jfondren wrote:
 No. And when was the first one?
here: On Monday, 13 September 2021 at 18:45:22 UTC, jfondren wrote:
  auto p = cast(EpollEvent*) pureMalloc(EpollEvent.sizeof);
What? Allocate struct epoll_event on the heap? It is a feeble joke ;)
It is an example of deliberately static storage that does not fix your problem, thereby proving that the broken lifetimes of the struct are not your only problem. I explained that one at the time, and I explained this one. If it comes with an explanation, it's probably not a joke.
 ```c
     static int ecap__add(int fd, void *dptr)
     {
         struct epoll_event waitfor = {0};
            int flags, r;

         waitfor.data.ptr = dptr;

         r = epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &waitfor);
         if (-1 == r) {
 ```

 All fd's (sockets, timers etc) are added the same way
 and corresponding EventSources are not destroyed by GC.
GC needs to be able to stop your program and find all of the live objects in it. The misaligned pointer and the reference-containing struct that vanishes on the return of your corresponding function are both problems for this.
Sep 14 2021
parent reply eugene <dee0xeed gmail.com> writes:
On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:
 GC needs to be able to stop your program
nice fantasies...
 and find all of the live objects in it. The misaligned pointer 
 and the reference-containing struct that vanishes on the return 
 of your corresponding function are both problems for this.
where did you find 'misaligned pointer'?...
Sep 14 2021
next sibling parent reply jfondren <julian.fondren gmail.com> writes:
On Tuesday, 14 September 2021 at 16:56:52 UTC, eugene wrote:
 On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:
 GC needs to be able to stop your program
nice fantasies...
 and find all of the live objects in it. The misaligned pointer 
 and the reference-containing struct that vanishes on the 
 return of your corresponding function are both problems for 
 this.
where did you find 'misaligned pointer'?...
It doesn't seem like communication between us is possible, in the "a five-pound phone won't sell" way. You can find this answer explained with code in an earlier post. My suggestion remains: try troubleshooting by making your program safe.
Sep 14 2021
parent reply eugene <dee0xeed gmail.com> writes:
On Tuesday, 14 September 2021 at 17:02:32 UTC, jfondren wrote:
 It doesn't seem like communication between us is possible
and you are wrong, as usual ,)
 in the "a five-pound phone won't sell" way.
I am not a 'selling boy'
 My suggestion remains: try troubleshooting by making your 
 program  safe.
Please, take that clever bot away.
Sep 14 2021
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/14/21 2:05 PM, eugene wrote:
 On Tuesday, 14 September 2021 at 17:02:32 UTC, jfondren wrote:
 It doesn't seem like communication between us is possible
and you are wrong, as usual ,)
 in the "a five-pound phone won't sell" way.
I am not a 'selling boy'
 My suggestion remains: try troubleshooting by making your program  safe.
Please, take that clever bot away.
People are trying to help you here. With that attitude, you are likely to stop getting help. -Steve
Sep 14 2021
parent eugene <dee0xeed gmail.com> writes:
On Tuesday, 14 September 2021 at 18:33:33 UTC, Steven 
Schveighoffer wrote:
 People are trying to help you here.
Then, answer the questions. Why those sg0 and sg1 are 'collected' by this so f... antstic GC?
Sep 14 2021
prev sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 9/14/21 9:56 AM, eugene wrote:

 On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:
 The misaligned pointer and the
 reference-containing struct that vanishes on the return of your
 corresponding function are both problems for this.
where did you find 'misaligned pointer'?...
I think it's the align(1) for EpollEvent. I was able to reproduce the segmentation fault and was seemingly able to fix it by making the EventSource class references alive by adding a constructor: align (1) struct EpollEvent { align(1): uint event_mask; EventSource es; this(uint event_mask, EventSource es) { this.event_mask = event_mask; this.es = es; living ~= es; // <-- Introduced this constructor for this line } /* just do not want to use that union, epoll_data_t */ } // Here is the array that keeps EventSource alive: EventSource[] living; If that really is the fix, of course the references must be taken out of that container when possible. Ali
Sep 14 2021
next sibling parent reply jfondren <julian.fondren gmail.com> writes:
On Tuesday, 14 September 2021 at 20:59:14 UTC, Ali Çehreli wrote:
 On 9/14/21 9:56 AM, eugene wrote:

 On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:
 The misaligned pointer and the
 reference-containing struct that vanishes on the return of
your
 corresponding function are both problems for this.
where did you find 'misaligned pointer'?...
I think it's the align(1) for EpollEvent. I was able to reproduce the segmentation fault and was seemingly able to fix it by making the EventSource class references alive by adding a constructor: align (1) struct EpollEvent { align(1): uint event_mask; EventSource es; this(uint event_mask, EventSource es) { this.event_mask = event_mask; this.es = es; living ~= es; // <-- Introduced this constructor for this line } /* just do not want to use that union, epoll_data_t */ } // Here is the array that keeps EventSource alive: EventSource[] living; If that really is the fix, of course the references must be taken out of that container when possible. Ali
Yep. This patch is sufficient to prevent the segfault: ``` diff --git a/engine/ecap.d b/engine/ecap.d index 71cb646..d57829c 100644 --- a/engine/ecap.d +++ b/engine/ecap.d -32,6 +32,7 final class EventQueue { private int id; private bool done; private MessageQueue mq; + private EventSource[] sources; private this() { id = epoll_create1(0); -52,6 +53,7 final class EventQueue { void registerEventSource(EventSource es) { auto e = EpollEvent(0, es); + sources ~= es; int r = epoll_ctl(id, EPOLL_CTL_ADD, es.id, &e); assert(r == 0, "epoll_ctl(ADD) failed"); } -63,7 +65,10 final class EventQueue { } void deregisterEventSource(EventSource es) { + import std.algorithm : countUntil, remove; + auto e = EpollEvent(0, es); + sources = sources.remove(sources.countUntil(es)); int r = epoll_ctl(id, EPOLL_CTL_DEL, es.id, &e); assert(r == 0, "epoll_ctl(DEL) failed"); } ``` Going through the project and adding safe: to the top of everything results in these errors: https://gist.github.com/jrfondren/c7f7b47be057273830d6a31372895895 some I/O, some system functions, some weird C APIs ... and misaligned assignments to EpollEvent.es. So debugging with safe isn't bad, but I'd still like rustc-style error codes: ``` engine/ecap.d(89): Error E415: field `EpollEvent.es` cannot assign to misaligned pointers in ` safe` code $ dmd --explain E415 Yeah see, the garbage collector only looks for pointers at pointer-aligned addresses. ```
Sep 15 2021
parent eugene <dee0xeed gmail.com> writes:
On Wednesday, 15 September 2021 at 23:07:45 UTC, jfondren wrote:
 Yep. This patch is sufficient to prevent the segfault:
Your idea (hold references to all event sources somewhere) is quite clear, but it confuses me a bit, since 1) there **are** references to all event sources **already**, they are data members in StageMachine subclasses. 2) only two of many events sources are destroyed, namely, those which are referenced by sg1 and sg0 in Stopper machine of echo-client. All other event sources are not destroyed.
Sep 19 2021
prev sibling next sibling parent reply eugene <dee0xeed gmail.com> writes:
On Tuesday, 14 September 2021 at 20:59:14 UTC, Ali Çehreli wrote:
 On 9/14/21 9:56 AM, eugene wrote:

 On Tuesday, 14 September 2021 at 16:43:50 UTC, jfondren wrote:
 The misaligned pointer and the
 reference-containing struct that vanishes on the return of
your
 corresponding function are both problems for this.
where did you find 'misaligned pointer'?...
I think it's the align(1) for EpollEvent.
The definition of this struct was taken from /usr/include/dmd/druntime/import/core/sys/linux/epoll.d ```d version (X86_Any) { align(1) struct epoll_event { align(1): uint events; epoll_data_t data; } } ``` I am using my own definition, because data field has not any special meaning for the Linux kernel, it is returned as is by epoll_wait(). I am always using this field as pointer to EventSource. This struct has to be 12 bytes for x86 arch, in /usr/include/linux/eventpoll.h it looks like this: ```c struct epoll_event { __u32 events; __u64 data; } EPOLL_PACKED; ``` At some moment I had different definition (align is only inside): ```d struct EpollEvent { align(1): uint event_mask; EventSource es; /* just do not want to use that union, epoll_data_t */ } ``` But it's appeared: 1) relatively fresh gdc (from Linux Mint 19) does the right thing, the structure is packed and has 12 bytes size. 2) old gdc (from Debian 8) produces 16 bytes EventEpoll and both programs gets SIGSEGV right after first return from epoll_wait(), hence this check: ```d static assert(EpollEvent.sizeof == 12); ``` If the reason for crash was in EpollEvent alignment, programs would segfaults always very soon after start, just right after the very first return from epoll_wait().
Sep 18 2021
parent reply jfondren <julian.fondren gmail.com> writes:
On Saturday, 18 September 2021 at 09:39:24 UTC, eugene wrote:
 The definition of this struct was taken from
 /usr/include/dmd/druntime/import/core/sys/linux/epoll.d
...
 If the reason for crash was in EpollEvent alignment,
 programs would segfaults always very soon after start,
 just right after the very first return from epoll_wait().
The struct's fine as far as libc and the kernel are concerned. epoll_wait is not even using those 64 bits or interpreting them as containing any kind of data, it's just moving them around for the caller to use. It's also not a hardware error to interpret those bits where they are as a pointer. They are however not 64-bit aligned so D's GC is collecting objects that only they point to.
Sep 18 2021
parent eugene <dee0xeed gmail.com> writes:
On Saturday, 18 September 2021 at 09:54:05 UTC, jfondren wrote:
 On Saturday, 18 September 2021 at 09:39:24 UTC, eugene wrote:
 The definition of this struct was taken from
 /usr/include/dmd/druntime/import/core/sys/linux/epoll.d
...
 If the reason for crash was in EpollEvent alignment,
 programs would segfaults always very soon after start,
 just right after the very first return from epoll_wait().
The struct's fine as far as libc and the kernel are concerned. epoll_wait is not even using those 64 bits or interpreting them as containing any kind of data, it's just moving them around for the caller to use. It's also not a hardware error to interpret those bits where they are as a pointer.
Exactly.
 They are however not 64-bit aligned so D's GC is collecting 
 objects that only they point to.
Ok... 1) There are 303 event sources in echo-server, 200 in RX machines (100 Ios and 100 Timers), 100 Ios in TX machines and finally 3 in Listener (one Io and two signals, **sg0 and sg1**) All of these 303 references in EpollEvent struct are 'misaligned' in this sense, but **non of corresponding objects are collected**. 2) There are 22 event sources in echo-client, 20 in RX machines (10 Ios and 10 Timers), 10 Ios in TX machines and finally 2 in Stopper machines (**sg0 and sg1**, for handling SIGINT and SIGTERM), but **only the two last are collected**, all other are not - here is the problem.
Sep 19 2021
prev sibling parent reply eugene <dee0xeed gmail.com> writes:
 reference-containing struct that vanishes on the return of your 
 corresponding function
I do not think it's a problem, otherwise **both programs would not work at all**. However, echo-server works without any surprises; echo-client also works, except that EventSources pointed by sg0 and sg1 data members in the Stopper instance, are cleared by GC soon after echo-client start. This does not mean that echo-client gets SIGSEGV right after those objects are destroyed, no - the crash happens later, upon receiving SIGINT or SIGTERM.
Sep 19 2021
parent reply jfondren <julian.fondren gmail.com> writes:
On Sunday, 19 September 2021 at 08:51:31 UTC, eugene wrote:
 reference-containing struct that vanishes on the return of 
 your corresponding function
I do not think it's a problem, otherwise **both programs would not work at all**.
The GC doesn't reliably punish objects living past there not being any references to them because it's not always operating. If you have a tight loop where the GC is never invoked, you can do what ever crazy things you want. Your program doesn't crash until you hit ctrl-C after all.
 Look...
 I have added stopper into an array...

 ```d
     Stopper[] stoppers;
     auto stopper = new Stopper();
     stoppers ~= stopper;
     stopper.run();
 ```

 and, you won't believe, this have fixed the problem -
 the objects, referenced by sg0 and sg1 are not destroyed 
 anymore.
This is a sufficient patch to prevent the segfault: ``` diff --git a/echo_client.d b/echo_client.d index 1f8270e..5ec41df 100644 --- a/echo_client.d +++ b/echo_client.d -32,7 +32,7 void main(string[] args) { sm.run(); } - auto stopper = new Stopper(); + scope stopper = new Stopper(); stopper.run(); writeln(" === Hello, world! === "); ``` The `scope` stack-allocates Stopper. This is also a sufficient patch to prevent the segfault: ``` diff --git a/echo_client.d b/echo_client.d index 1f8270e..0b968a8 100644 --- a/echo_client.d +++ b/echo_client.d -39,4 +39,6 void main(string[] args) { auto md = new MessageDispatcher(); md.loop(); writeln(" === Goodbye, world! === "); + writeln(stopper.sg0.number); + //writeln(stopper.sg1.number); } ``` either one of those writelns will do it. Without either of the above, STOPPER is destroyed a few seconds into a run of echo-client: ``` $ ./echo-client | grep STOPPER 'STOPPER' registered 24 (esrc.Signal) 'STOPPER' registered 25 (esrc.Signal) 'STOPPER INIT' got 'M0' from 'SELF' 'STOPPER' enabled 24 (esrc.Signal) 'STOPPER' enabled 25 (esrc.Signal) (seconds pass) stopper.Stopper.~this(): STOPPER destroyed ``` You can hit ctrl-C prior to Stopper's destruction and there's no segfault. (On my system, it won't show the usual 'segfault' message to the terminal when grep is filtering like that, but if you turn on coredumps you can see one is only generated with a ctrl-C after Stopper's destroyed.) So this looks at first to me like a bug: dmd is allowing Stopper to be collected before the end of its lexical scope if it isn't used later in it. Except, forcing a collection right after `stopper.run()` doesn't destroy it. Here's a patch that destroys Stopper almost immediately, so that a ctrl-C within milliseconds of the program starting will still segfault it. This also no longer requires the server to be active. diff --git a/engine/edsm.d b/engine/edsm.d index 513d8a5..ea9ac3a 100644 --- a/engine/edsm.d +++ b/engine/edsm.d -176,6 +176,8 class StageMachine { "'%s %s' got '%s' from '%s'", name, currentStage.name, eventName, m.src ? (m.src is this ? "SELF" : m.src.name) : "OS" ); + import core.memory : GC; + GC.collect; if (eventName !in currentStage.reflexes) { valgrind: ``` ^C==14893== Thread 1: ==14893== Jump to the invalid address stated on the next line ==14893== at 0x2: ??? ==14893== by 0x187A3C: void disp.MessageDispatcher.loop() ==14893== by 0x1BED89: _Dmain ``` with Stopper's collection prevented and some logging around reactTo: ``` ^Csi.sizeof = 128 about to react to Message(null, stopper.Stopper, 0, esrc.Signal) 'STOPPER IDLE' got 'S0' from 'OS' goodbye, world reacted === Goodbye, world! === 1 ecap.EventQueue.~this stopper.Stopper.~this(): STOPPER destroyed ``` So the problem here is that ctrl-C causes that message to come but Stopper's been collected and that address contains garbage. Since the Message in the MessageQueue should keep it alive, I think this is probably a bug in dmd.
Sep 19 2021
next sibling parent eugene <dee0xeed gmail.com> writes:
On Sunday, 19 September 2021 at 16:27:55 UTC, jfondren wrote:
 So the problem here is that ctrl-C causes that message to come 
 but Stopper's been collected and that address contains garbage.
This is exactly what I was trying to say... Thanx a lot for your in-depth investigation of the trouble! I'll try your patches later.
 Since the Message in the MessageQueue should keep it alive, I 
 think this is probably a bug in dmd.
In the starting post I noticed that - when compiled with gdc, echo-client does not crash - when compiled with ldc, no crash - but when compiled with gdc -Os, same crash as with dmd. The last was (and still is) the most confusing observation for me.
Sep 19 2021
prev sibling next sibling parent eugene <dee0xeed gmail.com> writes:
On Sunday, 19 September 2021 at 16:27:55 UTC, jfondren wrote:
 This is a sufficient patch to prevent the segfault:

 ```
 diff --git a/echo_client.d b/echo_client.d
 index 1f8270e..5ec41df 100644
 --- a/echo_client.d
 +++ b/echo_client.d
    -32,7 +32,7    void main(string[] args) {
          sm.run();
      }

 -    auto stopper = new Stopper();
 +    scope stopper = new Stopper();
      stopper.run();
I tried stack allocated stopper in my second 'simple example' and... No segfault, but: http://zed.karelia.ru/0/e/oops.png As can be seen from the screenshot, destructors of sg0 and sg1 were not called, but at the very end something went completely wrong.
Sep 19 2021
prev sibling parent eugene <dee0xeed gmail.com> writes:
On Sunday, 19 September 2021 at 16:27:55 UTC, jfondren wrote:
 This is also a sufficient patch to prevent the segfault:

 ```
 diff --git a/echo_client.d b/echo_client.d
 index 1f8270e..0b968a8 100644
 --- a/echo_client.d
 +++ b/echo_client.d
    -39,4 +39,6    void main(string[] args) {
      auto md = new MessageDispatcher();
      md.loop();
      writeln(" === Goodbye, world! === ");
 +    writeln(stopper.sg0.number);
 +    //writeln(stopper.sg1.number);
  }
This one really helps, program terminates as expected: ``` 'MAIN IDLE' got 'T0' from 'OS' 'MAIN IDLE' got 'T0' from 'OS' ^Csi.sizeof = 128 'STOPPER IDLE' got 'S0' from 'OS' 0 === Goodbye, world! === ___!!!___edsm.StageMachine.~this(): MAIN destroyed... ecap.EventQueue.~this !!! esrc.EventSource.~this() : esrc.Timer (owner MAIN, fd = 4) this 0x7f15e6c870c0 ___!!!___edsm.StageMachine.~this(): STOPPER destroyed... !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 5) this 0x7f15e6c8a150 !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 6) this 0x7f15e6c8a180 ```
Sep 19 2021
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/14/21 10:56 AM, jfondren wrote:
 On Tuesday, 14 September 2021 at 14:40:55 UTC, eugene wrote:
 On Tuesday, 14 September 2021 at 12:09:03 UTC, Steven Schveighoffer 
 wrote:
 This project is too big and complex
Really, "too big and complex"? It's as simple as a tabouret :) It's just a toy/hobby 'project'.
A 5-pound phone isn't "too heavy" for an adult to carry but it won't sell well. It's not just about capabilities but what efforts people are willing to expend. I would troubleshoot your issue by gradually making it safe and thinking about exceptions. One exception I didn't think about earlier was the 'misaligned pointer' one that I said I suppressed just to find the next safe complaint: https://dlang.org/spec/garbage.html says:
 Do not misalign pointers if those pointers may point into the GC heap,
So even if the lifetimes of your EventSource structs are fixed, the GC can reap the object they're pointing to. You could fix this by having a 128-bit struct and passing C an index into it, so to speak.
I don't think this is the problem. The misaligned pointers are only happening within the stack frame, along with references to the objects stored also in another parameter. So they should not cause problems with the GC. The storage of the references inside other objects is not misaligned. -Steve
Sep 14 2021
prev sibling parent eugene <dee0xeed gmail.com> writes:
On Tuesday, 14 September 2021 at 12:09:03 UTC, Steven 
Schveighoffer wrote:
 This project is too big and complex for me to diagnose by just 
 reading, it would take some effort
take a look at https://www.routledge.com/Modeling-Software-with-Finite-State-Machines-A-Practical-Approach/Wagner-Schmuki-Wagner-Wolstenholme/p/book/9780367390860# 'Event/Message Driven State Machines' (http://zed.karelia.ru/mmedia/bin/edsm-g2-rev-h.tar.gz) was inspired by this nice book.
Sep 14 2021
prev sibling parent reply eugene <dee0xeed gmail.com> writes:
On Monday, 13 September 2021 at 17:54:43 UTC, eugene wrote:
 full src is here
 http://zed.karelia.ru/0/e/edsm-in-d-2021-09-10.tar.gz
I've also made two simple examples, just in case - http://zed.karelia.ru/0/e/edsm-in-d-simple-example-1.tar.gz Program does nothing, just waits for ^c, does not crash upon SIGINT. Now, let's put some pressure on garbage collector - http://zed.karelia.ru/0/e/edsm-in-d-simple-example-2.tar.gz Every 10 ms do some allocations: ```d void mainIdleEnter() { tm0.enable(); tm0.heartBeat(10); // milliseconds } void mainIdleT0(StageMachine src, Object o) { int[] a; foreach (k; 0 .. 1000) { a ~= k; } } ``` After 3 seconds from the start destructors are called edsm-in-d-simple-example-2 $ ./test | grep owner !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 5) this 0x7fa267872150 !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 6) this 0x7fa267872180 After this happens, pressing ^C results in segfault.
Sep 19 2021
parent reply eugene <dee0xeed gmail.com> writes:
On Sunday, 19 September 2021 at 20:12:45 UTC, eugene wrote:
 On Monday, 13 September 2021 at 17:54:43 UTC, eugene wrote:
 full src is here
 http://zed.karelia.ru/0/e/edsm-in-d-2021-09-10.tar.gz
I've also made two simple examples, just in case - http://zed.karelia.ru/0/e/edsm-in-d-simple-example-1.tar.gz Program does nothing, just waits for ^c, does not crash upon SIGINT. Now, let's put some pressure on garbage collector - http://zed.karelia.ru/0/e/edsm-in-d-simple-example-2.tar.gz
I rearranged the code of main() like this: ```d void main(string[] args) { auto Main = new Main(); auto stopper = new Stopper(); Main.run(); stopper.run(); writeln(" === Hello, world! === "); auto md = new MessageDispatcher(); md.loop(); writeln(" === Goodbye, world! === "); } ``` And it works correctly! Miracles... :)
Sep 19 2021
parent eugene <dee0xeed gmail.com> writes:
On Sunday, 19 September 2021 at 21:10:16 UTC, eugene wrote:
 I rearranged the code of main() like this:
Similar rearrangement fixed the echo-client as well. (I moved creation of Stopper to the very beginning of main())
Sep 20 2021
prev sibling next sibling parent eugene <dee0xeed gmail.com> writes:
On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:
 And the most strange thing is this
It is echo-server/echo-client pair. And it is echo-client that crashes upon SIGINT. echo-server contains very similar code class Listener : StageMachine { enum ulong M0_WORK = 0; enum ulong M1_WORK = 1; enum ulong M0_GONE = 0; RestRoom workerPool; ushort port; TCPListener reception; Signal sg0, sg1; this(RestRoom wPool, ushort port = 1111) { super("LISTENER"); workerPool = wPool; this.port = port; Stage init, work; init = addStage("INIT", &listenerInitEnter); work = addStage("WORK", &listenerWorkEnter); init.addReflex("M0", work); work.addReflex("L0", &listenerWorkL0); work.addReflex("M0", &listenerWorkM0); work.addReflex("S0", &listenerWorkS0); work.addReflex("S1", &listenerWorkS1); } void listenerInitEnter() { reception = newTCPListener(port); sg0 = newSignal(Signal.sigInt); sg1 = newSignal(Signal.sigTerm); msgTo(this, M0_WORK); } but it does not crashes (destruc). The only significant difference - it has TCPListener instance, besides absolutely the same sg0 and sg1 'channels'.
Sep 13 2021
prev sibling next sibling parent reply jfondren <julian.fondren gmail.com> writes:
On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:
 Then after pressing ^C (SIGINT) the program gets SIGSEGV, since 
 references to sg0 and sg1 are no longer valid (they are 
 "sitting" in epoll_event structure).
engine/ecap.d(54): Error: field `EpollEvent.es` cannot assign to misaligned pointers in ` safe` code engine/ecap.d(56): Error: cannot take address of local `e` in ` safe` function `registerEventSource` from adding safe to ecap.EventQueue.registerEventSource, and then from using a trusted block to silence the first complaint. Instead of using a temporary EpollEvent array in EventQueue.wait, you could make the array an instance variable and have registerEventSource populate it directly, so that the GC can always trace from this array to an EnventSource. ... however, I don't think this fixes your problem, or is your only problem, since the segfault's still observed when this memory is leaked: ```d void registerEventSource(EventSource es) { import core.memory : pureMalloc, GC; auto p = cast(EpollEvent*) pureMalloc(EpollEvent.sizeof); p.event_mask = 0; p.es = es; GC.addRoot(p); int r = epoll_ctl(id, EPOLL_CTL_ADD, es.id, p); assert(r == 0, "epoll_ctl(ADD) failed"); } ```
Sep 13 2021
next sibling parent eugene <dee0xeed gmail.com> writes:
On Monday, 13 September 2021 at 18:45:22 UTC, jfondren wrote:
 Instead of using a temporary EpollEvent array in 
 EventQueue.wait, you could make the array an instance variable 
 and have registerEventSource populate it directly
Actually, initial version of all that was using array, allocated in constructor, but then (when struggling with GC) I thought that array in stack will press GC less... ... It seems I said something stupid just now )
Sep 13 2021
prev sibling parent eugene <dee0xeed gmail.com> writes:
On Monday, 13 September 2021 at 18:45:22 UTC, jfondren wrote:
 ```d
         auto p = cast(EpollEvent*) 
 pureMalloc(EpollEvent.sizeof);
 ```
What? Allocate struct epoll_event on the heap? It is a feeble joke ;) ```c static int ecap__add(int fd, void *dptr) { struct epoll_event waitfor = {0}; int flags, r; waitfor.data.ptr = dptr; r = epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &waitfor); if (-1 == r) { ``` All fd's (sockets, timers etc) are added the same way and corresponding EventSources are not destroyed by GC.
Sep 14 2021
prev sibling next sibling parent reply eugene <dee0xeed gmail.com> writes:
On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:
 Then after pressing ^C (SIGINT) the program gets SIGSEGV, since 
 references to sg0 and sg1 are no longer valid (they are 
 "sitting" in epoll_event structure).
... forget to mention, crashes here: ```d bool wait() { const int maxEvents = 8; EpollEvent[maxEvents] events; if (done) return false; int n = epoll_wait(id, events.ptr, maxEvents, -1); if (-1 == n) return false; foreach (k; 0 .. n) { EventSource s = events[k].es; ulong ecode = s.eventCode(events[k].event_mask); // <<<<< SIGSEGV ``` sg0/sg1 are destroyed, so s points to wrong location.
Sep 14 2021
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/14/21 7:31 AM, eugene wrote:
 On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:
 Then after pressing ^C (SIGINT) the program gets SIGSEGV, since 
 references to sg0 and sg1 are no longer valid (they are "sitting" in 
 epoll_event structure).
... forget to mention, crashes here: ```d     bool wait() {         const int maxEvents = 8;         EpollEvent[maxEvents] events;         if (done)             return false;         int n = epoll_wait(id, events.ptr, maxEvents, -1);         if (-1 == n)             return false;         foreach (k; 0 .. n) {             EventSource s = events[k].es;             ulong ecode = s.eventCode(events[k].event_mask); // <<<<< SIGSEGV ``` sg0/sg1 are destroyed, so s points to wrong location.
Note that s likely still points at a valid memory address. However, when an object is destroyed, its vtable is nulled out (precisely to cause a segfault if you try to use an already-freed object). There is also the possibility the memory block has been reallocated to something else, and that is causing the segfault. But if the segfault is consistent, most likely it's the former problem. -Steve
Sep 14 2021
parent eugene <dee0xeed gmail.com> writes:
On Tuesday, 14 September 2021 at 12:13:15 UTC, Steven 
Schveighoffer wrote:
 On 9/14/21 7:31 AM, eugene wrote:
 On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:
              EventSource s = events[k].es;
              ulong ecode = s.eventCode(events[k].event_mask); 
 // <<<<< SIGSEGV
Note that s likely still points at a valid memory address.
yeah, this address is obtained from OS (epoll_event struct), compiler can not zero it.
 However, when an object is destroyed, its vtable is nulled out 
 (precisely to cause a segfault if you try to use an 
 already-freed object).
that's right - calling eventCode() method results in segfault.
Sep 14 2021
prev sibling next sibling parent eugene <dee0xeed gmail.com> writes:
On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:
 The instance of Stopper is created in the scope of main():

 ```d
 void main(string[] args) {

     auto stopper = new Stopper();
     stopper.run();
 ```
Look... I have added stopper into an array... ```d Stopper[] stoppers; auto stopper = new Stopper(); stoppers ~= stopper; stopper.run(); ``` and, you won't believe, this have fixed the problem - the objects, referenced by sg0 and sg1 are not destroyed anymore. This is much more acceptable 'solition' for me than adding all of that bunch of event sources into some array. But I'am still puzzled - what is so special in the stopper? echo-server has it 'reception' just as single variable and it works fine.
Sep 19 2021
prev sibling next sibling parent reply jfondren <julian.fondren gmail.com> writes:
On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:
 I do not understand at all why GC considers those sg0 and sg1 
 as unreferenced.
 And why old gdc (without -Os) and old ldc do not.
Conclusion: There's nothing special about sg0 and sg1, except that they're part of Stopper. The Stopper in main() is collected before the end of main() because it's not used later in the function and because there are apparently no other references to it that the GC can find (because the only reference is hidden inside the Linux epoll API). More discussion: https://forum.dlang.org/thread/siajpj$3p2$1 digitalmars.com http://dpldocs.info/this-week-in-d/Blog.Posted_2021_09_20.html Misaligned pointers are one way to hide objects from the GC but in this case they really weren't relevant. I just had a confused idea of the epoll API, because I'd only ever used it with a single static array that all epoll functions referenced, similarly to poll(). But actually epoll copies the event structures that you give it, and returns them on epoll_wait. That's wild.
Sep 21 2021
next sibling parent reply eugene <dee0xeed gmail.com> writes:
On Tuesday, 21 September 2021 at 19:42:48 UTC, jfondren wrote:
 On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:
 There's nothing special about sg0 and sg1, except that they're 
 part of Stopper. The Stopper in main() is collected before the 
 end of main() because it's not used later in the function
Okay, but how could you explain this then ```d void main(string[] args) { auto Main = new Main(); Main.run(); auto stopper = new Stopper(); stopper.run(); ``` ``` d-lang/edsm-in-d-simple-example-2 $ ./test | grep STOPPER 'STOPPER' registered 5 (esrc.Signal) 'STOPPER' registered 6 (esrc.Signal) 'STOPPER INIT' got 'M0' from 'SELF' 'STOPPER' enabled 5 (esrc.Signal) 'STOPPER' enabled 6 (esrc.Signal) ___!!!___edsm.StageMachine.~this(): STOPPER destroyed... !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 5) this 0x7fc9ab1a9150 !!! esrc.EventSource.~this() : esrc.Signal (owner STOPPER, fd = 6) this 0x7fc9ab1a9180 ``` Now, change operation order in the main like this: ```d void main(string[] args) { auto Main = new Main(); auto stopper = new Stopper(); Main.run(); stopper.run(); ``` ``` d-lang/edsm-in-d-simple-example-2 $ ./test | grep STOPPER 'STOPPER' registered 5 (esrc.Signal) 'STOPPER' registered 6 (esrc.Signal) 'STOPPER INIT' got 'M0' from 'SELF' 'STOPPER' enabled 5 (esrc.Signal) 'STOPPER' enabled 6 (esrc.Signal) ``` Everything is Ok now, stopper is not collected soon after start. So the question is how this innocent looking change can affect GC behavior so much?...
 Misaligned pointers are one way to hide objects from the GC but 
 in this case they really weren't relevant.
For sure.
Sep 21 2021
next sibling parent reply jfondren <julian.fondren gmail.com> writes:
On Tuesday, 21 September 2021 at 20:17:15 UTC, eugene wrote:
 Now, change operation order in the main like this:

 ```d
 void main(string[] args) {

     auto Main = new Main();
     auto stopper = new Stopper();

     Main.run();
     stopper.run();
 ```

 ```
 d-lang/edsm-in-d-simple-example-2 $ ./test | grep STOPPER
 'STOPPER' registered 5 (esrc.Signal)
 'STOPPER' registered 6 (esrc.Signal)
 'STOPPER   INIT' got 'M0' from 'SELF'
 'STOPPER' enabled 5 (esrc.Signal)
 'STOPPER' enabled 6 (esrc.Signal)
 ```

 Everything is Ok now,
I don't think this is reliably OK. If you're not using Stopper later in the function, and if there are no other references to it, then the GC can collect it. It just has no obligation to collect it, so minor differences like this might prevent that from happening for particular compilers/options/versions. and Java's just as aggressive about potential collection. It's just something that mostly doesn't matter until it becomes an incredibly weird bug with code like yours.
Sep 21 2021
parent reply eugene <dee0xeed gmail.com> writes:
On Tuesday, 21 September 2021 at 20:28:33 UTC, jfondren wrote:
 Everything is Ok now,
I don't think this is reliably OK. If you're not using Stopper later in the function, and if there are no other references to it, then the GC can collect it. It just has no obligation to collect it, so minor differences like this might prevent that from happening for particular compilers/options/versions.
I saw a thread on this forum named 'Why are so many programmers do not like GC' or something like that. After this adventure I would add my 5 cents: because (sometimes, ok) it behaves absolutely unpredictable, depending on operation order, "particular compilers/options/versions" etc.
Sep 22 2021
parent reply jfondren <julian.fondren gmail.com> writes:
On Wednesday, 22 September 2021 at 08:03:59 UTC, eugene wrote:
 On Tuesday, 21 September 2021 at 20:28:33 UTC, jfondren wrote:
 Everything is Ok now,
I don't think this is reliably OK. If you're not using Stopper later in the function, and if there are no other references to it, then the GC can collect it. It just has no obligation to collect it, so minor differences like this might prevent that from happening for particular compilers/options/versions.
I saw a thread on this forum named 'Why are so many programmers do not like GC' or something like that. After this adventure I would add my 5 cents: because (sometimes, ok) it behaves absolutely unpredictable, depending on operation order, "particular compilers/options/versions" etc.
Nondeterminism in heap collection is a very common complaint, but here we have data is that apparently on the stack that is collected nondeterministically. I can't say I like that.
Sep 22 2021
parent eugene <dee0xeed gmail.com> writes:
On Wednesday, 22 September 2021 at 10:05:05 UTC, jfondren wrote:
 Nondeterminism in heap collection is a very common complaint,
It is another kind of nondeterminism that is usually complained about ("*sometime* in the future GC will collect if it wants" or so)
 but here we have data is that apparently on the stack that is 
 collected nondeterministically. I can't say I like that.
Exactly, so we've finally come to an agreement, great! :)
Sep 22 2021
prev sibling next sibling parent reply eugene <dee0xeed gmail.com> writes:
On Tuesday, 21 September 2021 at 20:17:15 UTC, eugene wrote:

 Now, change operation order in the main like this:
Actually, all proposed 'fixes' - use stopper somehow in the end (writeln(stopper.sg0.number)) - change operation order - etc are strange. I mean it's strange (for me) that these fixes make garbage collector behave as needed.
Sep 21 2021
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Sep 21, 2021 at 08:36:49PM +0000, eugene via Digitalmars-d-learn wrote:
 On Tuesday, 21 September 2021 at 20:17:15 UTC, eugene wrote:
 
 Now, change operation order in the main like this:
Actually, all proposed 'fixes' - use stopper somehow in the end (writeln(stopper.sg0.number)) - change operation order - etc are strange. I mean it's strange (for me) that these fixes make garbage collector behave as needed.
It's not strange. You're seeing these problems because you failed to inform the GC about the dependency between Main and stopper. So it's free to assume that these are two independent, unrelated objects, and therefore it can collect either one as soon as there are no more references to it. And since stopper isn't used anymore after declaration, an optimizing compiler is free to assume that it's not needed afterwards, so it's not obligated to keep the reference alive until the end of the function. Since in actually there *is* a dependency between these objects, the most "correct" solution is to include a reference to stopper somewhere in Main. Then the GC would be guaranteed never to collect stopper before Main becomes unreferenced. T -- Век живи - век учись. А дураком помрёшь.
Sep 21 2021
next sibling parent eugene <dee0xeed gmail.com> writes:
On Tuesday, 21 September 2021 at 20:47:41 UTC, H. S. Teoh wrote:
 And since stopper isn't used anymore after declaration, an 
 optimizing compiler is free to assume that it's not needed 
 afterwards, so it's not obligated to keep the reference alive 
 until the end of the function.
It was not obvious for me, I thought lifetimes always lasts until the end of a scope (main in this case).
Sep 21 2021
prev sibling parent eugene <dee0xeed gmail.com> writes:
On Tuesday, 21 September 2021 at 20:47:41 UTC, H. S. Teoh wrote:
 Век живи - век учись. А дураком помрёшь.
:) "Век живи - век учись, всё равно дураком помрёшь." is correct version. :)
Sep 21 2021
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Sep 21, 2021 at 08:17:15PM +0000, eugene via Digitalmars-d-learn wrote:
[...]
 ```d
 void main(string[] args) {
 
     auto Main = new Main();
     Main.run();
 
     auto stopper = new Stopper();
     stopper.run();
 ```
[...]
 ```d
 void main(string[] args) {
 
     auto Main = new Main();
     auto stopper = new Stopper();
 
     Main.run();
     stopper.run();
 ```
[...]
 Everything is Ok now, stopper is not collected soon after start.
 So the question is how this innocent looking change can affect GC
 behavior so much?...
In the first example, the compiler sees that the lifetime of Main is disjoint from the lifetime of stopper, so it's free to reuse the same stack space (or register(s)) to store both variables. (This is a pretty standard optimization FYI.) So the line `auto stopper = new Stopper();` would overwrite the reference to Main, and the GC would see Main as an unreferenced object and may collect it at any point after the line `Main.run();`. In the second case, since the lifetimes of Main and stopper overlap, the compiler (probably) conservatively assumes that their lifetimes last until the end of the function, and so reserves disjoint places for them on the stack. This does not mean you're 100% safe, however. A sufficiently optimizing compiler may determine that since Main and stopper are independent, it is free to reorder the code such that the two lifetimes are independent, and therefore end up with the same situation as the first example. If Main really depends on the existence of stopper, I'd argue that it really should store a reference to stopper somewhere, so that as long as Main is not unreferenced the GC would not collect stopper. T -- What's an anagram of "BANACH-TARSKI"? BANACH-TARSKI BANACH-TARSKI.
Sep 21 2021
parent eugene <dee0xeed gmail.com> writes:
On Tuesday, 21 September 2021 at 20:42:12 UTC, H. S. Teoh wrote:
 A sufficiently optimizing compiler may determine that since 
 Main and stopper are independent, it is free to reorder the 
 code such that the two lifetimes are independent, and therefore 
 end up with the same situation as the first example.
In other words, compiler is trying to be smarter than a programmer :) With a poor result... But... it is **main** function, after all! Maybe, main() should be an exception when performing that 'smart' optimizations? ;) Btw, is there any dmd option for turning all/some optimizations off? Or some 'pragma/attribute'?
Sep 22 2021
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/21/21 4:17 PM, eugene wrote:
 On Tuesday, 21 September 2021 at 19:42:48 UTC, jfondren wrote:
 On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:
 There's nothing special about sg0 and sg1, except that they're part of 
 Stopper. The Stopper in main() is collected before the end of main() 
 because it's not used later in the function
Okay, but how could you explain this then ```d void main(string[] args) {     auto Main = new Main();     Main.run();     auto stopper = new Stopper();     stopper.run(); ```
Here is what is happening. The compiler keeps track of how long it needs to keep `stopper` around. In assembly, the `new Stopper()` call is a function which returns in a register. On the very next instruction, you are calling the function `stopper.run` where it needs the value of the register (either pushed into an argument register, or put on the call stack, depending on the ABI). Either way, this is the last time in the function the value `stopper` is needed. Therefore, it does not store it on the stack frame of `main`. This is an optimization, but one that is taken even without optimizations enabled in some compilers. It's called [dead store elimination](https://en.wikipedia.org/wiki/Dead_store). Since the register is overwritten by subsequent function calls, there no longer exists a reference to `stopper`, and it gets collected (along with the members that are only referenced via `stopper`).
 
 Now, change operation order in the main like this:
 
 ```d
 void main(string[] args) {
 
      auto Main = new Main();
      auto stopper = new Stopper();
 
      Main.run();
      stopper.run();
 ```
 
 ```
 d-lang/edsm-in-d-simple-example-2 $ ./test | grep STOPPER
 'STOPPER' registered 5 (esrc.Signal)
 'STOPPER' registered 6 (esrc.Signal)
 'STOPPER   INIT' got 'M0' from 'SELF'
 'STOPPER' enabled 5 (esrc.Signal)
 'STOPPER' enabled 6 (esrc.Signal)
 ```
 
 Everything is Ok now, stopper is not collected soon after start.
 So the question is how this innocent looking change
 can affect GC behavior so much?...
In this case, at the point you call `Main.run`, `stopper` is only in a register. Yet, it's needed later, so the compiler has no choice but to put `stopper` on the stack so it has access to it to call `stopper.run`. If it didn't, it's likely that `Main.run` will overwrite that register. Once it's on the stack, the GC can see it for the full run of `main`. This is why this case is different. Note that Java is even more aggressive, and might *still* collect it, because it could legitimately set `stopper` to null after the last use to signify that it's no longer needed. I don't anticipate D doing this though. I recommend you read the blog post, it has details on how this is happening. How do you fix it? I have proposed a possible solution, but I'm not sure if it's completely sound, see [here](https://forum.dlang.org/post/sichju$2gth$1 digitalmars.com). It may be that this works today, but a future more clever compiler can potentially see through this trick and still not store the pinned value. I think the spec is wrong to say that just storing something as a local variable should solve the problem. We should follow the lead of other GC-supporting languages, and provide a mechanism to ensure a pointer is scannable by the GC through the entire scope. -Steve
Sep 22 2021
next sibling parent eugene <dee0xeed gmail.com> writes:
On Wednesday, 22 September 2021 at 11:44:16 UTC, Steven 
Schveighoffer wrote:
 Here is what is happening.
Many thanks for this so exhaustive explanation!
Sep 22 2021
prev sibling parent reply eugene <dee0xeed gmail.com> writes:
On Wednesday, 22 September 2021 at 11:44:16 UTC, Steven 
Schveighoffer wrote:
 Once it's on the stack, the GC can see it for the full run of 
 `main`. This is why this case is different.

 Note that Java is even more aggressive, and might *still* 
 collect it, because it could legitimately set `stopper` to null 
 after the last use to signify that it's no longer needed.
And it follows that programming in GC-supporting languages *may* be harder than in languages with manual memory management, right?
Sep 22 2021
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/22/21 8:22 AM, eugene wrote:
 On Wednesday, 22 September 2021 at 11:44:16 UTC, Steven Schveighoffer 
 wrote:
 Once it's on the stack, the GC can see it for the full run of `main`. 
 This is why this case is different.

 Note that Java is even more aggressive, and might *still* collect it, 
 because it could legitimately set `stopper` to null after the last use 
 to signify that it's no longer needed.
And it follows that programming in GC-supporting languages *may* be harder than in languages with manual memory management, right?
Only when interfacing with C ;) Which admittedly is a stated goal for D. It's telling that I've been using D for 14 years and never had or seen this problem. -Steve
Sep 22 2021
parent reply eugene <dee0xeed gmail.com> writes:
On Wednesday, 22 September 2021 at 12:26:53 UTC, Steven 
Schveighoffer wrote:
 On 9/22/21 8:22 AM, eugene wrote:
 And it follows that programming in GC-supporting languages
 *may* be harder than in languages with manual memory
 management, right?
I meant my this particular trouble... I do not want to understand how and what compiler generates, I just want to get working program without any oddities. Nevertheless, thank you again for your nice explanation!
 Only when interfacing with C ;) Which admittedly is a stated 
 goal for D.
I know. And this is just fine to have the ability of using libc (especially system calls) without
 It's telling that I've been using D for 14 years and never had 
 or seen this problem.
Bond. James Bond. :) 25 years of C coding. Now, imaging a shock I was under when I 'discovered' that swapping two lines of code can magically fix my prog and make GC do the right thing. :) Actually, D is a nice language per se and I truly wish it to be as popular as java/python/etc. But these GC ... mmm... 'features' may reduce to zero any wish to learn D, that's about it. When my C program crashes, I'm 100% sure I made something stupid - forget to initialize a pointer, easy to find and fix - did some memory corruption (worse, but then electric fence is my best friend) But if a crash is caused by 'optimization' + GC... It looks like a programmer must keep some implicit/unwritten rules in order to write correctly...
Sep 22 2021
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/22/21 11:47 AM, eugene wrote:
 On Wednesday, 22 September 2021 at 12:26:53 UTC, Steven Schveighoffer 
 wrote:
 On 9/22/21 8:22 AM, eugene wrote:
 And it follows that programming in GC-supporting languages
 *may* be harder than in languages with manual memory
 management, right?
I meant my this particular trouble...
In terms of any kind of memory management, whether it be ARC, manual, GC, or anything else, there will always be pitfalls. It's just that you have to get used to the pitfalls and how to avoid them. I could see a person used to GC complaining that C requires you to free every pointer *exactly once*. I mean, how can that be acceptable? ;)
 I do not want to understand how and what
 compiler generates, I just want to
 get working program without any oddities.
And for the most part, you do not. It's just when you travel outside the language, you must obey certain constraints. Those constraints are laid out, and unfortunately not exactly correct (we need to amend the spec/library for this), but given correct constraints, the rules are not super-difficult to follow.
 Nevertheless, thank you again for your nice explanation!
You are welcome!
 It's telling that I've been using D for 14 years and never had or seen 
 this problem.
Bond. James Bond. :) 25 years of C coding. Now, imaging a shock I was under when I 'discovered' that swapping two lines of code can magically fix my prog and make GC do the right thing. :)
I'm right there with you (been writing code for about 25 years, maybe 26, depending on when I switched majors to CS in college). But realize that C has it's share of "shocks" as well, you are just more used to them (or maybe you have been lucky so far?)
 
 Actually, D is a nice language per se
 and I truly wish it to be as popular as java/python/etc.
 But these GC ... mmm... 'features' may reduce
 to zero any wish to learn D, that's about it.
Your experience is not typical though (clearly, as many of us long-time D users had no idea why it was happening). But for sure if this turns you off, I can understand how it can be too frustrating to learn the new rules. I personally would probably never write C code again if I can help it, despite having decades of experience in C/C++. I did recently have to port a C plugin library from PHP 5 to PHP 7, and it wasn't pleasant.
 
 When my C program crashes, I'm 100% sure I made something stupid
 
 - forget to initialize a pointer, easy to find and fix
 - did some memory corruption (worse, but then electric fence is my best 
 friend)
 
 But if a crash is caused by 'optimization' + GC...
 It looks like a programmer must keep some
 implicit/unwritten rules in order to write correctly...
 
I find it interesting how you blame yourself for C's idiosyncrasies, but not for D's ;) I would say C has far more pitfalls than D. Check out the undefined behaviors for C. -Steve
Sep 22 2021
next sibling parent reply eugene <dee0xeed gmail.com> writes:
On Wednesday, 22 September 2021 at 18:38:34 UTC, Steven 
Schveighoffer wrote:
 Your experience is not typical though (clearly, as many of us 
 long-time D users had no idea why it was happening).
Oh, yeah - I have special trait of bumping against various low probability things :)
 But for sure if this turns you off, I can understand how it can 
 be too frustrating to learn the new rules.
Show me these rules! Always use an object at the end of a function? Make a second reference to an object somewhere on the heap? The 'problem' here is that there is no clear rule. Any reasonable 'hack' will do.
Sep 23 2021
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/23/21 3:27 AM, eugene wrote:
 On Wednesday, 22 September 2021 at 18:38:34 UTC, Steven Schveighoffer 
 wrote:
 Your experience is not typical though (clearly, as many of us 
 long-time D users had no idea why it was happening).
Oh, yeah - I have special trait of bumping against various low probability things :)
 But for sure if this turns you off, I can understand how it can be too 
 frustrating to learn the new rules.
Show me these rules!
They are here: https://dlang.org/spec/interfaceToC.html#storage_allocation With the caveat, of course, that the recommendation to "leave a pointer on the stack" is not as easy to follow as one might think with the optimizer fighting against that. We need to add a better way to do that [attempt](https://code.dlang.org/packages/keepalive), but I think it's not guaranteed to work, I've already found ways to prove it fails. -Steve
Sep 23 2021
next sibling parent reply eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 12:53:14 UTC, Steven 
Schveighoffer wrote:
 Show me these rules!
They are here: https://dlang.org/spec/interfaceToC.html#storage_allocation With the caveat, of course, that the recommendation to "leave a pointer on the stack" is not as easy to follow as one might think with the optimizer fighting against that.
Yes, as you explained me, the root of the problem in my examples were dead store elimination.

 KeepAlive).
Do you mean some function attribute?..
Sep 23 2021
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/23/21 9:18 AM, eugene wrote:
 On Thursday, 23 September 2021 at 12:53:14 UTC, Steven Schveighoffer wrote:

Do you mean some function attribute?..
suggested -- use the object later. However, they are recognized by the compiler as an intrinsic which generates no code or side effects, but is not subject to elimination by the optimizer. See more details: https://docs.microsoft.com/en-us/dotnet/api/system.gc.keepalive?view=net-5.0#remarks -Steve
Sep 23 2021
parent reply eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 15:56:16 UTC, Steven 
Schveighoffer wrote:
 See more details:

 https://docs.microsoft.com/en-us/dotnet/api/system.gc.keepalive?view=net-5.0#remarks
" This method references the obj parameter, making that object ineligible for garbage collection from the start of the routine to the point, in execution order, where this method is called. Code this method at the end, not the beginning, of the range of instructions where obj must be available. " **Code this method at the end...** :) it is the same as proposed by jfondren simple writeln(stopper.sg0.number) in the end of main, right?
Sep 23 2021
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/23/21 12:58 PM, eugene wrote:
 On Thursday, 23 September 2021 at 15:56:16 UTC, Steven Schveighoffer wrote:
 See more details:

 https://docs.microsoft.com/en-us/dotnet/api/system.gc.keepalive?
iew=net-5.0#remarks 
" This method references the obj parameter, making that object ineligible for garbage collection from the start of the routine to the point, in execution order, where this method is called. Code this method at the end, not the beginning, of the range of instructions where obj must be available. " **Code this method at the end...** :) it is the same as proposed by jfondren simple writeln(stopper.sg0.number) in the end of main, right?
Same effect, but writeln actually executes code to write data to the console, whereas KeepAlive doesn't do anything. Essentially, you get the side effect of keeping the object as live, without paying the penalty of inserting frivolous code. All my efforts to achieve the same via a library were thwarted by at least LDC (whose optimizer is very good). The only possible solution I can think of is to generate an opaque function that LDC cannot see into, in order to force it to avoid inlining, and have that function do nothing. However, there's always Link-Time-Optmization... -Steve
Sep 23 2021
parent reply eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 17:16:23 UTC, Steven 
Schveighoffer wrote:
 On 9/23/21 12:58 PM, eugene wrote:
 On Thursday, 23 September 2021 at 15:56:16 UTC, Steven 
 Schveighoffer wrote:
 See more details:

 https://docs.microsoft.com/en-us/dotnet/api/system.gc.keepalive?view=net-5.0#remarks
" This method references the obj parameter, making that object ineligible for garbage collection from the start of the routine to the point, in execution order, where this method is called. Code this method at the end, not the beginning, of the range of instructions where obj must be available. " **Code this method at the end...** :) it is the same as proposed by jfondren simple writeln(stopper.sg0.number) in the end of main, right?
Same effect, but writeln actually executes code to write data to the console, whereas KeepAlive doesn't do anything.
```d void keepAlive(Object o) { } void main(string[] args) { import core.memory : GC; auto Main = new Main(); Main.run(); auto stopper = new Stopper(); stopper.run(); writeln(" === Hello, world! === "); auto md = new MessageDispatcher(); md.loop(); keepAlive(Main); keepAlive(stopper); writeln(" === Goodbye, world! === "); } ``` works ok with dmd, stopper is not collected.
Sep 23 2021
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/23/21 2:18 PM, eugene wrote:
 On Thursday, 23 September 2021 at 17:16:23 UTC, Steven Schveighoffer wrote:
 On 9/23/21 12:58 PM, eugene wrote:
 On Thursday, 23 September 2021 at 15:56:16 UTC, Steven Schveighoffer 
 wrote:
 See more details:

 https://docs.microsoft.com/en-us/dotnet/api/system.gc.keepalive?
iew=net-5.0#remarks 
" This method references the obj parameter, making that object ineligible for garbage collection from the start of the routine to the point, in execution order, where this method is called. Code this method at the end, not the beginning, of the range of instructions where obj must be available. " **Code this method at the end...** :) it is the same as proposed by jfondren simple writeln(stopper.sg0.number) in the end of main, right?
Same effect, but writeln actually executes code to write data to the console, whereas KeepAlive doesn't do anything.
```d void keepAlive(Object o) { } void main(string[] args) {     import core.memory : GC;     auto Main = new Main();     Main.run();     auto stopper = new Stopper();     stopper.run();     writeln(" === Hello, world! === ");     auto md = new MessageDispatcher();     md.loop();     keepAlive(Main);     keepAlive(stopper);     writeln(" === Goodbye, world! === "); } ``` works ok with dmd, stopper is not collected.
With dmd -O -inline, there is a chance it will be collected. Inlining is key here. -Steve
Sep 23 2021
parent eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 18:43:36 UTC, Steven 
Schveighoffer wrote:
 With dmd -O -inline, there is a chance it will be collected. 
 Inlining is key here.
never mind, GC.addRoot() looks more trustworthy, anyway :)
Sep 23 2021
prev sibling parent reply eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 12:53:14 UTC, Steven 
Schveighoffer wrote:
 With the caveat, of course, that the recommendation to "leave a 
 pointer on the stack" is not as easy to follow as one might 
 think with the optimizer fighting against that. We need to add 

 [attempt](https://code.dlang.org/packages/keepalive), but I 
 think it's not guaranteed to work, I've already found ways to 
 prove it fails.
For the moment I am personally quite happy with any reasonable workaround (use an object in the end of the main function, put the reference to an object into some AA, whatever), because now I firmly understand, that the source of strange GC behavior is DSE optimization (in this case).
Sep 23 2021
parent reply eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 14:00:30 UTC, eugene wrote:

 For the moment I am personally quite happy
```d void main(string[] args) { import core.memory : GC; auto Main = new Main(); GC.addRoot(cast(void*)Main); Main.run(); auto stopper = new Stopper(); GC.addRoot(cast(void*)stopper); stopper.run(); ``` Fine, works!
Sep 23 2021
parent reply jfondren <julian.fondren gmail.com> writes:
On Thursday, 23 September 2021 at 14:23:40 UTC, eugene wrote:
 On Thursday, 23 September 2021 at 14:00:30 UTC, eugene wrote:

 For the moment I am personally quite happy
```d void main(string[] args) { import core.memory : GC; auto Main = new Main(); GC.addRoot(cast(void*)Main); Main.run(); auto stopper = new Stopper(); GC.addRoot(cast(void*)stopper); stopper.run(); ``` Fine, works!
Nice. I thought of GC.addRoot several times but I was distracted by the general solution of using object lifetimes with it, so that a struct's destructor would call GC.removeRoot. For your case just pinning these and forgetting about them is the easiest way to do it.
Sep 23 2021
parent reply eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 14:31:34 UTC, jfondren wrote:
 Nice. I thought of GC.addRoot several times but I was 
 distracted by the general solution of using object lifetimes 
 with it, so that a struct's destructor would call 
 GC.removeRoot. For your case just pinning these and forgetting 
 about them is the easiest way to do it.
Yes, these two must live until the end of main(). Moreover, in real (C) programs I (usually) do not create state machines on the fly, instead I keep them in pools, like RX/TX machines pools in echo-server and in echo-client.
Sep 23 2021
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/23/21 10:55 AM, eugene wrote:
 On Thursday, 23 September 2021 at 14:31:34 UTC, jfondren wrote:
 Nice. I thought of GC.addRoot several times but I was distracted by 
 the general solution of using object lifetimes with it, so that a 
 struct's destructor would call GC.removeRoot. For your case just 
 pinning these and forgetting about them is the easiest way to do it.
Yes, these two must live until the end of main(). Moreover, in real (C) programs I (usually) do not create state machines on the fly, instead I keep them in pools, like RX/TX machines pools in echo-server and in echo-client.
Technically, they should live past the end of main, because it's still possible to receive signals then. But the chances of someone hitting ctrl-c in that window are quite small. -Steve
Sep 23 2021
parent reply eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 15:53:37 UTC, Steven 
Schveighoffer wrote:
 Technically, they should live past the end of main, because 
 it's still possible to receive signals then.
No, as soon as an application get SIGTERM/SIGINT, event queue is stopped and we do not need no more notifications from OS (POLLIN/POLLOUT I mean). Stopping event queue in this case is just closing file descriptor obtained from epoll_create(). After this getting POLLIN from any fd (including signal fd) is just impossible.
Sep 23 2021
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/23/21 12:53 PM, eugene wrote:
 On Thursday, 23 September 2021 at 15:53:37 UTC, Steven Schveighoffer wrote:
 Technically, they should live past the end of main, because it's still 
 possible to receive signals then.
No, as soon as an application get SIGTERM/SIGINT, event queue is stopped and we do not need no more notifications from OS (POLLIN/POLLOUT I mean). Stopping event queue in this case is just closing file descriptor obtained from epoll_create(). After this getting POLLIN from any fd (including signal fd) is just impossible.
That's not what is triggering the segfault though. The segfault is triggered by the signal handler referencing the destroyed object. So imagine the sequence: 1. ctrl-c, signal handler triggers, shutting down the loop 2. main exits 3. GC finalizes all objects, including the Stopper and it's members 4. ctrl-c happens again, but you didn't unregister the signal handler, so it's run again, referencing the now-deleted object. 5. segfault It's theoretically a very very small window. -Steve
Sep 23 2021
next sibling parent reply eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 17:20:18 UTC, Steven 
Schveighoffer wrote:
 So imagine the sequence:
With ease!
 1. ctrl-c, signal handler triggers, shutting down the loop
Just a note: there is no 'signal handler' in the program. SIGINT/SIGTERM are **blocked**, notifications (POLLIN) are received via epoll_wait().
 2. main exits
 3. GC finalizes all objects, including the Stopper and it's 
 members
Probably, a destructor for Signal class should be added, in which - close fd, obtained from signalfd() - unblock the signal (thus default signal handler is back again)
 4. ctrl-c happens again, but you didn't unregister the signal 
 handler, so it's run again, referencing the now-deleted object.
At this point we have default signal handler
 5. segfault
 It's theoretically a very very small window.
But even without destructor, no segfault will happen, because **there is no signal handler**
Sep 23 2021
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/23/21 1:44 PM, eugene wrote:
 On Thursday, 23 September 2021 at 17:20:18 UTC, Steven Schveighoffer wrote:
 So imagine the sequence:
With ease!
 1. ctrl-c, signal handler triggers, shutting down the loop
Just a note: there is no 'signal handler' in the program. SIGINT/SIGTERM are **blocked**, notifications (POLLIN) are received via epoll_wait().
Oh interesting! I didn't read the code closely enough.
 
 2. main exits
 3. GC finalizes all objects, including the Stopper and it's members
Probably, a destructor for Signal class should be added, in which - close fd, obtained from signalfd() - unblock the signal (thus default signal handler is back again)
Yes, I would recommend that. Always good for a destructor to clean up any non-GC resources that haven't already been cleaned up. That's actually what class destructors are for.
 
 4. ctrl-c happens again, but you didn't unregister the signal handler, 
 so it's run again, referencing the now-deleted object.
At this point we have default signal handler
 5. segfault
 It's theoretically a very very small window.
But even without destructor, no segfault will happen, because **there is no signal handler**
So it gets written to the file descriptor instead? And nobody is there reading it, so it's just closed along with the process? I've not done signals this way, it seems pretty clever and less prone to asynchronous issues. -Steve
Sep 23 2021
parent reply eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 18:53:25 UTC, Steven 
Schveighoffer wrote:
 On 9/23/21 1:44 PM, eugene wrote:
 Just a note: there is no 'signal handler' in the program.
 SIGINT/SIGTERM are **blocked**, notifications (POLLIN) are 
 received via epoll_wait().
Oh interesting! I didn't read the code closely enough.
"everything in Unix is a file" (c) All event sources (sockets, timers, signal, file system events) can be 'routed' through i/o multiplexing facilities, like select/poll(posix)/epoll(linux)/queue(freebsd) etc.
 Probably, a destructor for Signal class should be added, in 
 which
 Yes, I would recommend that. Always good for a destructor to 
 clean up any non-GC resources that haven't already been cleaned 
 up. That's actually what class destructors are for.
No, destructors are not necessary, since after SIGINT/SIRTERM program is about to terminate and all resources will be released anyway. In C I do same way - do not close fd, which live from start to end, do not free() pointers and so on, no need.
 So it gets written to the file descriptor instead?
When signal happens (or timer expires, or file is deleted) process get EPOLLIN on corresponding file descriptor via epoll_wait() and then process has to read some info from these file descriptors.
 And nobody is there reading it, so it's just closed along with 
 the process?
Yes, as any other file descriptor.
 I've not done signals this way, it seems pretty clever and less 
 prone to asynchronous issues.
It's just great, thanks to Linux kernel developers. Look in to engine dir in the source. C (more elaborated) variant: http://zed.karelia.ru/mmedia/bin/edsm-g2-rev-h.tar.gz
Sep 23 2021
parent eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 19:32:12 UTC, eugene wrote:
 C (more elaborated) variant:
 http://zed.karelia.ru/mmedia/bin/edsm-g2-rev-h.tar.gz
Sound, GUI? Easy, see http://zed.karelia.ru/mmedia/bin/xjiss4.tar.gz It's computer keyboard 'piano', based on the same engine. As I've already mentioned, I was inspired several years ago by very nice book, 'Modeling Software with Finite State Machines: A Practical Approach' by Wagner F. et al. I applied some ideas from the book to Posix/Linux API and developed EDSM - 'Event driven state machines' - and since I do not need no libev, libevent and alike. State machines per se are very powerful methodology to model program behavior. Also notice, that machines communicates with each other by messages - remember Alan Key main OOP principle? It's message exchange, not class hierarchy (inheritance blah-blah-blah). As to client-server echo pair in D, it is my 3rd (and most successful!!!) attempt to re-implement EDSM in some 'modern' language. Initially my criteria for choosing a lang were: - compiles to native code, goodby Java - no garbage collector, goodby almost all :) - no classes, only interfaces. - maybe, something else, do not remember The only language, that fit to my initial desires, was Rust. But borrow checker (and especially <'a>, expicite lifetimes) appeared to be the real hell for me. One can use raw pointers instead of references, but then your code is from head to toe is in unsafe {} blocks. Here, I was not be able to make signals work properly and I dropped it. After reading some texts a la 'alternatives to c++', I decided to try D, despite it's 'unpopularity'.
Sep 23 2021
prev sibling parent reply eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 17:20:18 UTC, Steven 
Schveighoffer wrote:
 1. ctrl-c, signal handler triggers, shutting down the loop
 2. main exits
 3. GC finalizes all objects, including the Stopper and it's 
 members
but both SIGINT and SIGTERM are still **blocked**, they just will not reach the process.
Sep 23 2021
parent reply eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 17:49:43 UTC, eugene wrote:
 On Thursday, 23 September 2021 at 17:20:18 UTC, Steven 
 Schveighoffer wrote:
 1. ctrl-c, signal handler triggers, shutting down the loop
 2. main exits
 3. GC finalizes all objects, including the Stopper and it's 
 members
but both SIGINT and SIGTERM are still **blocked**, they just will not reach the process.
oops.. closing epoll fd should be moved from EventQueue dtor to stop() method, then everything will be Ok.
Sep 23 2021
parent eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 17:53:00 UTC, eugene wrote:
 On Thursday, 23 September 2021 at 17:49:43 UTC, eugene wrote:
 On Thursday, 23 September 2021 at 17:20:18 UTC, Steven 
 Schveighoffer wrote:
 1. ctrl-c, signal handler triggers, shutting down the loop
 2. main exits
 3. GC finalizes all objects, including the Stopper and it's 
 members
but both SIGINT and SIGTERM are still **blocked**, they just will not reach the process.
oops..
no oops, that's all right. - when creating Signal instance, corresponding signal becames blocked ```d final class Signal : EventSource { enum int sigInt = SIGINT; enum int sigTerm = SIGTERM; ulong number; this(int signo) { super('S'); sigset_t sset; sigset_t old_sset; /* block the signal */ sigemptyset(&sset); sigaddset(&sset, signo); sigprocmask(SIG_BLOCK, &sset, &old_sset); id = signalfd(-1, &sset, SFD_CLOEXEC); ``` - upon receiving SIGINT stopperIdleS0() is called. now stop variable of EventQueue is false - next call to wait() method just return. (remember, signals are still blocked)
Sep 23 2021
prev sibling next sibling parent eugene <dee0xeed gmail.com> writes:
On Wednesday, 22 September 2021 at 18:38:34 UTC, Steven 
Schveighoffer wrote:
 In terms of any kind of memory management, whether it be ARC, 
 manual, GC, or anything else, there will always be pitfalls. 
 It's just that you have to get used to the pitfalls and how to 
 avoid them.
100% agree.
 I could see a person used to GC complaining that C requires you 
 to free every pointer *exactly once*.
C (at compiler level) does not require this. You can do it free()ly. ,) With a subsequent... yes, 'shock' after you see 'double free or corruption' message. Tell that imaginary person this: if (p) {free(p); p = NULL}, that's all.
Sep 23 2021
prev sibling next sibling parent eugene <dee0xeed gmail.com> writes:
On Wednesday, 22 September 2021 at 18:38:34 UTC, Steven 
Schveighoffer wrote:
 But realize that C has it's share of "shocks" as well
Any language is just an instrument, most of the 'shocks' come not from languages themselves, but from the 'enviromment', so to say. An example that came to mind... Did you know that sending data via write()/send() to a socket, that is in CLOSE_WAIT state, results in sending data to nowhere and write() indicates no error? By the way, GC is a sort of 'environment' (not the language itself), it acts behind the scenes (unless you are using it directly)
 you are just more used to them (or maybe you have been lucky so 
 far?)
Once I've been very 'lucky' with unaligned pointer dereference on ARM...
Sep 23 2021
prev sibling parent reply eugene <dee0xeed gmail.com> writes:
On Wednesday, 22 September 2021 at 18:38:34 UTC, Steven 
Schveighoffer wrote:
 I find it interesting how you blame yourself for C's 
 idiosyncrasies
Me? Blaming *myself* for C 'idiosyncrasies'? :) Where?
 but not for D's ;)
I've been learning D for about 3 months only.
 I would say C has far more pitfalls than D.
No doubt - and I've never said C is "better" than D. I was going to try betterC subset (say, try to implement dynamic arrays), but did not have much free time yet.
 Check out the undefined behaviors for C.
Nothing interesting... Most of UB in C are just programmer's sloppiness. C requires a programmer to be careful/punctual, much more careful, than ... a python, for ex.
Sep 23 2021
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/23/21 8:10 AM, eugene wrote:
 On Wednesday, 22 September 2021 at 18:38:34 UTC, Steven Schveighoffer 
 wrote:
 I find it interesting how you blame yourself for C's idiosyncrasies
Me? Blaming *myself* for C 'idiosyncrasies'? :) Where?
"When my C program crashes, I'm 100% sure I made something stupid" One might argue that C's approach to memory management is a contributor to people writing code that fails.
 I would say C has far more pitfalls than D.
No doubt - and I've never said C is "better" than D. I was going to try betterC subset (say, try to implement dynamic arrays), but did not have much free time yet.
Your assertion that programming in GC languages may be harder than manual memory languages is what I was addressing. My point is that C has a lot more memory management pitfalls than D, not addressing any "better than" arguments.
 
 Check out the undefined behaviors for C.
Nothing interesting... Most of UB in C are just programmer's sloppiness. C requires a programmer to be careful/punctual, much more careful, than ... a python, for ex.
UB in C leaves traps for the programmer, similar to this trap you have found in the GC. Where code doesn't do what you are expecting it to do. -Steve
Sep 23 2021
parent reply eugene <dee0xeed gmail.com> writes:
On Thursday, 23 September 2021 at 13:05:07 UTC, Steven 
Schveighoffer wrote:
 UB in C leaves traps for the programmer, similar to this trap 
 you have found in the GC. Where code doesn't do what you are 
 expecting it to do.
There is a difference, though. As I've already said, GC is a sort of 'environment', the code of GC exits by it's own. In C, no such code is 'inserted' by compiler into code written by a programmer. So, in C it is MY (potentially wrong) code. In D, it is NOT MY code, it is GC. From this point of view debugging *may* be harder (especially by beginners, like me)
Sep 23 2021
parent jfondren <julian.fondren gmail.com> writes:
On Thursday, 23 September 2021 at 13:30:42 UTC, eugene wrote:
 So, in C it is MY (potentially wrong) code.
 In D, it is NOT MY code, it is GC.
Actually in both cases it is MY+the compiler's code. A very similar example from C-land (without my digging up the exact details) is something like ```c for (int i = 0; i >= 0; i++) { // exit loop on signed integer overflow } ``` where gcc 2.95 would do what "MY code" said, but later gcc versions would 'optimize' into an infinite loop (followed by dead code that can now be removed): ```c for (;;) { // never exit loop } ``` Because in math, positive numbers never +1 into negative numbers. And in C this is undefined behavior which is (modern understanding:) complete license for the compiler to do anything at all. And on the specific architecture we are specifically compiling for there is specific behavior--but who cares about that, this is optimization! And if you complained about it, well you were a sloppy coder actually, for wanting the target architecture's actual behavior with your actual code as you actually wrote it. (If you feel like defending C's honor here, please, I've heard it already. Everybody thinks very highly of the nasal demons joke.) There are other cases where very security-minded software had defensive code that an optimizer decided would never be needed, that then exposed a software vulnerability, or there are 'unnecessary' writes that are intended to remove a password from memory: https://duckduckgo.com/?q=dead+code+elimination+security+vulnerability
Sep 23 2021
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Sep 21, 2021 at 07:42:48PM +0000, jfondren via Digitalmars-d-learn
wrote:
 On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:
 I do not understand at all why GC considers those sg0 and sg1 as
 unreferenced.
 And why old gdc (without -Os) and old ldc do not.
Conclusion: There's nothing special about sg0 and sg1, except that they're part of Stopper. The Stopper in main() is collected before the end of main() because it's not used later in the function and because there are apparently no other references to it that the GC can find (because the only reference is hidden inside the Linux epoll API).
Quick and dirty workaround: keep references to those objects in static variables to prevent GC collection: auto myFunc(...) { static MyType* dontCollect = null; MyType* obj = new MyObject(...); dontCollect = obj; scope(exit) dontCollect = null; // may collect after function exits ... // function body goes here } T -- Verbing weirds language. -- Calvin (& Hobbes)
Sep 21 2021
prev sibling parent eugene <dee0xeed gmail.com> writes:
On Monday, 13 September 2021 at 17:18:30 UTC, eugene wrote:
 And the most strange thing is this - if using gdc with -Os 
 flag, the program behaves
 exactly as when compiled with fresh dmd - destructors for sg0 
 and sg1 are called soon after program start.
Now I guess, gdc optimization by size imply DSE.
Sep 23 2021