digitalmars.D - btdu - a sampling disk usage profiler for btrfs (written in D)
- Vladimir Panteleev (33/33) Nov 08 2020 https://blog.cy.md/2020/11/08/btdu-sampling-disk-usage-profiler-for-btrf...
- user1234 (5/41) Nov 09 2020 I like the report about how D was efficienet to develop this
- Vladimir Panteleev (25/29) Nov 09 2020 Well, the README and linked blog post answer that to some extent,
- user1234 (3/32) Nov 09 2020 Allright it's clearer now, thanks for the clarifications ;)
- matheus (6/15) Nov 09 2020 I read about GC issues like this very often and my question is:
- Vladimir Panteleev (12/15) Nov 09 2020 You can disable the GC and you can run it manually, but this
- Steven Schveighoffer (5/21) Nov 09 2020 It would still help I think, because for instance, the UI is probably
- Jacob Carlborg (20/47) Nov 10 2020 I don't think this is specific to D. I've seen in the past
- Vladimir Panteleev (14/57) Nov 10 2020 I think it might be less of a problem in e.g. Go.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/10) Nov 10 2020 Would a thread local GC with reference counted shared objects
- Vladimir Panteleev (23/33) Nov 27 2020 I don't think there is a simple answer here.
- Ola Fosheim Grostad (4/14) Nov 27 2020 Hm, but it would only stop a single thread. You would not be
- Vladimir Panteleev (9/24) Nov 27 2020 Right, so that's another imposed limitation of such a GC. You'd
- Ola Fosheim Grostad (7/14) Nov 27 2020 I think it is no different than shared_ptr. I also think one can
- IGotD- (13/18) Nov 27 2020 Reference counting which also means multiple ownership doesn't
- Ola Fosheim Grostad (9/16) Nov 27 2020 You can view ARC as a borrowchecker. If the ARC optimizer
- Ola Fosheim Grostad (5/13) Nov 27 2020 Sadly, templated types that depend on struct size and field
https://blog.cy.md/2020/11/08/btdu-sampling-disk-usage-profiler-for-btrfs/ https://github.com/CyberShadow/btdu D-related thoughts: - D programs that build fine on one Linux machine may still fail to build with mysterious linking errors on another, even when using Dub which takes care of dependency management. I saw two counts of this, caused by differences in DMD/LDC and Arch/Debian (one being that, for whatever reason, libz is not pulled in on LDC/Debian despite being a Phobos dependency). Also, LDC is the D compiler that's installed by default when the system wants a D compiler (e.g. if you try do install Dub by itself). - The garbage collector is still a major hindrance for system programming. In this case it was due to the ioctls used being slow, and when the GC tries to stop the world to do its thing, it just hangs the entire program until ALL ioctls in all threads complete. This means it wasn't possible to have a stutter-free interactive UI, so I had to move processing to subprocesses. - One user wondered why the program needed so many threads. The answer was that half of them were owned by the GC (it never stops its worker threads, they just sit idle). - I used the Deimos ncurses bindings package. I'm thankful that it already existed, though I had to push some fixes to fix static linking. The most annoying part was waiting overnight for code.dlang.org to pick up the new tags, because there is no way to get it to update a package unless you're the owner, and no way to otherwise specify a dependency unless using a branch (which is deprecated and prints a big warning when your users build your program). - Nice D features that came in useful: reflection to generate a lightweight serializer/deserializer for subprocess communication; strings as slices to allow processing them without copying them out of the network buffer; and template mixins to add common behavior to types without runtime polymorphism.
Nov 08 2020
On Sunday, 8 November 2020 at 17:23:32 UTC, Vladimir Panteleev wrote:https://blog.cy.md/2020/11/08/btdu-sampling-disk-usage-profiler-for-btrfs/ https://github.com/CyberShadow/btdu D-related thoughts: - D programs that build fine on one Linux machine may still fail to build with mysterious linking errors on another, even when using Dub which takes care of dependency management. I saw two counts of this, caused by differences in DMD/LDC and Arch/Debian (one being that, for whatever reason, libz is not pulled in on LDC/Debian despite being a Phobos dependency). Also, LDC is the D compiler that's installed by default when the system wants a D compiler (e.g. if you try do install Dub by itself). - The garbage collector is still a major hindrance for system programming. In this case it was due to the ioctls used being slow, and when the GC tries to stop the world to do its thing, it just hangs the entire program until ALL ioctls in all threads complete. This means it wasn't possible to have a stutter-free interactive UI, so I had to move processing to subprocesses. - One user wondered why the program needed so many threads. The answer was that half of them were owned by the GC (it never stops its worker threads, they just sit idle). - I used the Deimos ncurses bindings package. I'm thankful that it already existed, though I had to push some fixes to fix static linking. The most annoying part was waiting overnight for code.dlang.org to pick up the new tags, because there is no way to get it to update a package unless you're the owner, and no way to otherwise specify a dependency unless using a branch (which is deprecated and prints a big warning when your users build your program). - Nice D features that came in useful: reflection to generate a lightweight serializer/deserializer for subprocess communication; strings as slices to allow processing them without copying them out of the network buffer; and template mixins to add common behavior to types without runtime polymorphism.I like the report about how D was efficienet to develop this tool, otherwise what do you use it for ? What is the typical usage of such tools ?
Nov 09 2020
On Monday, 9 November 2020 at 12:21:55 UTC, user1234 wrote:I like the report about how D was efficienet to develop this tool, otherwise what do you use it for ? What is the typical usage of such tools ?Well, the README and linked blog post answer that to some extent, but my personal use cases are actually tangential to D, so I can write more about that here. I've been using btrfs on my home system ever since switching to Linux full-time, and a few years ago I switched over the server (hosting this forum / the wiki / some other services) to it too. This allowed us to have incremental, atomic, hourly, off-site backups, which actually saved our butts big-time when the hosting provider decided to shut off the server over a clerical issue in the distant year of 2019. Some snapshots are also retained for a while to allow rollbacks or undelete files in case I fat-finger something during maintenance. One of btrfs's boons is that across subvolumes and clones, deduplication allows reusing the same unique block across many files and snapshots, which saves space but also what enables atomic snapshots to work (with successive writes being COW). If you add compression on top of that, it can be challenging to understand what is actually using how much space, and since storage costs are not insignificant on a FOSS budget, it does need to be managed, and I was missing a tool that would help do this. Another unique benefit of btdu is that it starts displaying results almost instantly, which is great when the disk is full causing everything to be on fire and you need to free up some disk space right now.
Nov 09 2020
On Monday, 9 November 2020 at 12:52:12 UTC, Vladimir Panteleev wrote:On Monday, 9 November 2020 at 12:21:55 UTC, user1234 wrote:Allright it's clearer now, thanks for the clarifications ;)I like the report about how D was efficienet to develop this tool, otherwise what do you use it for ? What is the typical usage of such tools ?Well, the README and linked blog post answer that to some extent, but my personal use cases are actually tangential to D, so I can write more about that here. I've been using btrfs on my home system ever since switching to Linux full-time, and a few years ago I switched over the server (hosting this forum / the wiki / some other services) to it too. This allowed us to have incremental, atomic, hourly, off-site backups, which actually saved our butts big-time when the hosting provider decided to shut off the server over a clerical issue in the distant year of 2019. Some snapshots are also retained for a while to allow rollbacks or undelete files in case I fat-finger something during maintenance. One of btrfs's boons is that across subvolumes and clones, deduplication allows reusing the same unique block across many files and snapshots, which saves space but also what enables atomic snapshots to work (with successive writes being COW). If you add compression on top of that, it can be challenging to understand what is actually using how much space, and since storage costs are not insignificant on a FOSS budget, it does need to be managed, and I was missing a tool that would help do this. Another unique benefit of btdu is that it starts displaying results almost instantly, which is great when the disk is full causing everything to be on fire and you need to free up some disk space right now.
Nov 09 2020
On Sunday, 8 November 2020 at 17:23:32 UTC, Vladimir Panteleev wrote:... - The garbage collector is still a major hindrance for system programming. In this case it was due to the ioctls used being slow, and when the GC tries to stop the world to do its thing, it just hangs the entire program until ALL ioctls in all threads complete. This means it wasn't possible to have a stutter-free interactive UI, so I had to move processing to subprocesses. ...I read about GC issues like this very often and my question is: Can't GC be set just to run without collecting anything, and manually set it to collect after a process is finished? Matheus.
Nov 09 2020
On Monday, 9 November 2020 at 13:33:50 UTC, matheus wrote:I read about GC issues like this very often and my question is: Can't GC be set just to run without collecting anything, and manually set it to collect after a process is finished?You can disable the GC and you can run it manually, but this wouldn't help in this case, because the ioctls are run across threads in an overlapping way. It would be possible if the program was designed such that every once in a while, the main thread tells all worker threads "OK, let's do a GC so nobody start any new ioctls for now", and when the last ioctl finishes run the GC and then let worker threads start ioctls again, but this means that up to all but one worker threads are idle and waiting for the last ioctl to finish. ioctl duration varies from milliseconds to seconds in this case, so it would noticeably affect throughput.
Nov 09 2020
On 11/9/20 8:41 AM, Vladimir Panteleev wrote:On Monday, 9 November 2020 at 13:33:50 UTC, matheus wrote:It would still help I think, because for instance, the UI is probably not running ioctls, and so it wouldn't pause while you are waiting for the ioctle-running threads to finish. -SteveI read about GC issues like this very often and my question is: Can't GC be set just to run without collecting anything, and manually set it to collect after a process is finished?You can disable the GC and you can run it manually, but this wouldn't help in this case, because the ioctls are run across threads in an overlapping way. It would be possible if the program was designed such that every once in a while, the main thread tells all worker threads "OK, let's do a GC so nobody start any new ioctls for now", and when the last ioctl finishes run the GC and then let worker threads start ioctls again, but this means that up to all but one worker threads are idle and waiting for the last ioctl to finish. ioctl duration varies from milliseconds to seconds in this case, so it would noticeably affect throughput.
Nov 09 2020
On Sunday, 8 November 2020 at 17:23:32 UTC, Vladimir Panteleev wrote:- D programs that build fine on one Linux machine may still fail to build with mysterious linking errors on another, even when using Dub which takes care of dependency management. I saw two counts of this, caused by differences in DMD/LDC and Arch/Debian (one being that, for whatever reason, libz is not pulled in on LDC/Debian despite being a Phobos dependency). Also, LDC is the D compiler that's installed by default when the system wants a D compiler (e.g. if you try do install Dub by itself).I don't think this is specific to D. I've seen in the past problems caused by package maintainers not building the package in the same way as upstream. Or they split up a package in multiple packages.- The garbage collector is still a major hindrance for system programming. In this case it was due to the ioctls used being slow, and when the GC tries to stop the world to do its thing, it just hangs the entire program until ALL ioctls in all threads complete.You should probably never let the GC run on a realtime thread, like audio or video processing (not sure if ioctls falls into this category). These days, modern UIs should probably fall into the realtime category.This means it wasn't possible to have a stutter-free interactive UI, so I had to move processing to subprocesses.I'm not sure if it's possible to ever have a completely stutter-free UI with a stop-the-world GC.- One user wondered why the program needed so many threads. The answer was that half of them were owned by the GC (it never stops its worker threads, they just sit idle).Is that the answer? I mean, the GC doesn't create any threads by itself, does it?- I used the Deimos ncurses bindings package. I'm thankful that it already existed, though I had to push some fixes to fix static linking. The most annoying part was waiting overnight for code.dlang.org to pick up the new tags, because there is no way to get it to update a package unless you're the owner, and no way to otherwise specify a dependency unless using a branch (which is deprecated and prints a big warning when your users build your program).Since 2.094.0, you can specify a Git repository as a dependency [1]. You can also specify a local path as a dependency [2], useful when developing a library and an application at the same time, as two separate Dub packages. [1] https://dlang.org/changelog/2.094.0.html#git-paths [2] https://dub.pm/package-format-sdl.html#version-specs
Nov 10 2020
On Tuesday, 10 November 2020 at 09:40:33 UTC, Jacob Carlborg wrote:On Sunday, 8 November 2020 at 17:23:32 UTC, Vladimir Panteleev wrote:I think it might be less of a problem in e.g. Go.- D programs that build fine on one Linux machine may still fail to build with mysterious linking errors on another, even when using Dub which takes care of dependency management. I saw two counts of this, caused by differences in DMD/LDC and Arch/Debian (one being that, for whatever reason, libz is not pulled in on LDC/Debian despite being a Phobos dependency). Also, LDC is the D compiler that's installed by default when the system wants a D compiler (e.g. if you try do install Dub by itself).I don't think this is specific to D. I've seen in the past problems caused by package maintainers not building the package in the same way as upstream. Or they split up a package in multiple packages.Doing UI without GC in D would be pretty painful. But, by itself the GC doesn't add much latency to introduce stutter in the UI - a GC scan is generally quick enough that the UI doesn't feel laggy or stuttery. The problem is that the GC is waiting for all threads to finish their ioctls, while the program otherwise is completely suspended. This affects not just UI, but throughput.- The garbage collector is still a major hindrance for system programming. In this case it was due to the ioctls used being slow, and when the GC tries to stop the world to do its thing, it just hangs the entire program until ALL ioctls in all threads complete.You should probably never let the GC run on a realtime thread, like audio or video processing (not sure if ioctls falls into this category). These days, modern UIs should probably fall into the realtime category.Yes, it does, since the introduction of parallel heap scanning in 2.087: https://dlang.org/changelog/2.087.0.html#gc_parallel- One user wondered why the program needed so many threads. The answer was that half of them were owned by the GC (it never stops its worker threads, they just sit idle).Is that the answer? I mean, the GC doesn't create any threads by itself, does it?This is super useful. Thanks.- I used the Deimos ncurses bindings package. I'm thankful that it already existed, though I had to push some fixes to fix static linking. The most annoying part was waiting overnight for code.dlang.org to pick up the new tags, because there is no way to get it to update a package unless you're the owner, and no way to otherwise specify a dependency unless using a branch (which is deprecated and prints a big warning when your users build your program).Since 2.094.0, you can specify a Git repository as a dependency [1]. You can also specify a local path as a dependency [2], useful when developing a library and an application at the same time, as two separate Dub packages. [1] https://dlang.org/changelog/2.094.0.html#git-paths [2] https://dub.pm/package-format-sdl.html#version-specs
Nov 10 2020
On Tuesday, 10 November 2020 at 10:42:09 UTC, Vladimir Panteleev wrote:But, by itself the GC doesn't add much latency to introduce stutter in the UI - a GC scan is generally quick enough that the UI doesn't feel laggy or stuttery. The problem is that the GC is waiting for all threads to finish their ioctls, while the program otherwise is completely suspended. This affects not just UI, but throughput.Would a thread local GC with reference counted shared objects work for your use case?
Nov 10 2020
On Tuesday, 10 November 2020 at 13:55:52 UTC, Ola Fosheim Grøstad wrote:On Tuesday, 10 November 2020 at 10:42:09 UTC, Vladimir Panteleev wrote:I don't think there is a simple answer here. Removing the global GC lock for allocations, and allowing each thread to allocate from its own private pool, would greatly improve the performance of multi-threaded applications. For example, the global GC lock was what was preventing moving more processing in Dustmite to worker threads - currently, it's often better to keep everything in one thread for GC-dependent code instead of using worker threads specifically because of the overhead of the global GC lock. I think such a modification would be possible without radical changes to the language or GC design, but it's possible I'm missing something. However, that wouldn't help in this case, because the problem here doesn't come from allocations, but from the stop-the-world aspect of the GC. A theoretical non-stop-the-world GC would indeed help in this situation, but such a GC is only possible if you restrict the language to a subset, such that all copies of managed objects are always visible to the compiler. It would require all system / extern(C) code to be carefully re-scrutinized. In short, this would essentially be a different language (based on D). I don't think we can get there from where we are now.But, by itself the GC doesn't add much latency to introduce stutter in the UI - a GC scan is generally quick enough that the UI doesn't feel laggy or stuttery. The problem is that the GC is waiting for all threads to finish their ioctls, while the program otherwise is completely suspended. This affects not just UI, but throughput.Would a thread local GC with reference counted shared objects work for your use case?
Nov 27 2020
On Friday, 27 November 2020 at 10:20:41 UTC, Vladimir Panteleev wrote:However, that wouldn't help in this case, because the problem here doesn't come from allocations, but from the stop-the-world aspect of the GC. A theoretical non-stop-the-world GC would indeed help in this situation, but such a GC is only possible if you restrict the language to a subset, such that all copies of managed objects are always visible to the compiler. It would require all system / extern(C) code to be carefully re-scrutinized. In short, this would essentially be a different language (based on D). I don't think we can get there from where we are now.Hm, but it would only stop a single thread. You would not be allowed to share nonpinned objects with other threads.
Nov 27 2020
On Friday, 27 November 2020 at 10:26:18 UTC, Ola Fosheim Grostad wrote:On Friday, 27 November 2020 at 10:20:41 UTC, Vladimir Panteleev wrote:Right, so that's another imposed limitation of such a GC. You'd still also lose the ability to memcpy or memset a struct that had managed pointers, as that would break the reference count that the GC relies on to work. It would definitely solve the performance problem, but it would be such a radical change that it would essentially be a different language (and debatedly no longer a system-programming one).However, that wouldn't help in this case, because the problem here doesn't come from allocations, but from the stop-the-world aspect of the GC. A theoretical non-stop-the-world GC would indeed help in this situation, but such a GC is only possible if you restrict the language to a subset, such that all copies of managed objects are always visible to the compiler. It would require all system / extern(C) code to be carefully re-scrutinized. In short, this would essentially be a different language (based on D). I don't think we can get there from where we are now.Hm, but it would only stop a single thread. You would not be allowed to share nonpinned objects with other threads.
Nov 27 2020
On Friday, 27 November 2020 at 10:31:21 UTC, Vladimir Panteleev wrote:Right, so that's another imposed limitation of such a GC. You'd still also lose the ability to memcpy or memset a struct that had managed pointers, as that would break the reference count that the GC relies on to work. It would definitely solve the performance problem, but it would be such a radical change that it would essentially be a different language (and debatedly no longer a system-programming one).I think it is no different than shared_ptr. I also think one can add some safety through global pointer analysis for existing code. Let the pinning be done by a counter, when you pin the object you get a smartpointer borrowed_ptr... when the count goes to zero, the object is local again.
Nov 27 2020
On Friday, 27 November 2020 at 10:41:54 UTC, Ola Fosheim Grostad wrote:I think it is no different than shared_ptr. I also think one can add some safety through global pointer analysis for existing code. Let the pinning be done by a counter, when you pin the object you get a smartpointer borrowed_ptr... when the count goes to zero, the object is local again.Reference counting which also means multiple ownership doesn't play well well with any borrowing mechanism. Reason is that the compiler cannot determine the borrow checker at compile time and must insert runtime checks if you are allowed to borrow or not. This reduces the performance, probably not a lot but still. Let's leave borrow checker outside D and just have good old reference counting, that's what we need. Speaking of parallel GC, even if we have atomic reference counting or other parallel method, the underlying malloc/free must also be non blocking or at least reduce the locking as much as possible. Many libc implementations have this already though.
Nov 27 2020
On Friday, 27 November 2020 at 11:31:48 UTC, IGotD- wrote:Reference counting which also means multiple ownership doesn't play well well with any borrowing mechanism. Reason is that the compiler cannot determine the borrow checker at compile time and must insert runtime checks if you are allowed to borrow or not. This reduces the performance, probably not a lot but still. Let's leave borrow checker outside D and just have good old reference counting, that's what we need.You can view ARC as a borrowchecker. If the ARC optimizer succeeds globally then all acquire/release can be omitted for that type (or that call graph path). The problem is interior pointers which would require fat pointers or borrowchecker... But again you could rewrite those fat pointers if ARC optimization is highly successful (if the code validates like it would for a borrow checker)
Nov 27 2020
On Friday, 27 November 2020 at 12:00:40 UTC, Ola Fosheim Grostad wrote:You can view ARC as a borrowchecker. If the ARC optimizer succeeds globally then all acquire/release can be omitted for that type (or that call graph path). The problem is interior pointers which would require fat pointers or borrowchecker... But again you could rewrite those fat pointers if ARC optimization is highly successful (if the code validates like it would for a borrowSadly, templated types that depend on struct size and field offsets could be a problem for such rewrites... So the compiler would have to annotate structs with dependencies...
Nov 27 2020