digitalmars.D - btdu - a sampling disk usage profiler for btrfs (written in D)

Vladimir Panteleev (33/33) Nov 08 2020 https://blog.cy.md/2020/11/08/btdu-sampling-disk-usage-profiler-for-btrf...

user1234 (5/41) Nov 09 2020 I like the report about how D was efficienet to develop this

Vladimir Panteleev (25/29) Nov 09 2020 Well, the README and linked blog post answer that to some extent,

user1234 (3/32) Nov 09 2020 Allright it's clearer now, thanks for the clarifications ;)

matheus (6/15) Nov 09 2020 I read about GC issues like this very often and my question is:

Vladimir Panteleev (12/15) Nov 09 2020 You can disable the GC and you can run it manually, but this

Steven Schveighoffer (5/21) Nov 09 2020 It would still help I think, because for instance, the UI is probably

Jacob Carlborg (20/47) Nov 10 2020 I don't think this is specific to D. I've seen in the past

Vladimir Panteleev (14/57) Nov 10 2020 I think it might be less of a problem in e.g. Go.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/10) Nov 10 2020 Would a thread local GC with reference counted shared objects

Vladimir Panteleev (23/33) Nov 27 2020 I don't think there is a simple answer here.

Ola Fosheim Grostad (4/14) Nov 27 2020 Hm, but it would only stop a single thread. You would not be

Vladimir Panteleev (9/24) Nov 27 2020 Right, so that's another imposed limitation of such a GC. You'd

Ola Fosheim Grostad (7/14) Nov 27 2020 I think it is no different than shared_ptr. I also think one can

IGotD- (13/18) Nov 27 2020 Reference counting which also means multiple ownership doesn't

Ola Fosheim Grostad (9/16) Nov 27 2020 You can view ARC as a borrowchecker. If the ARC optimizer

Ola Fosheim Grostad (5/13) Nov 27 2020 Sadly, templated types that depend on struct size and field

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

https://blog.cy.md/2020/11/08/btdu-sampling-disk-usage-profiler-for-btrfs/

https://github.com/CyberShadow/btdu

D-related thoughts:

- D programs that build fine on one Linux machine may still fail 
to build with mysterious linking errors on another, even when 
using Dub which takes care of dependency management. I saw two 
counts of this, caused by differences in DMD/LDC and Arch/Debian 
(one being that, for whatever reason, libz is not pulled in on 
LDC/Debian despite being a Phobos dependency). Also, LDC is the D 
compiler that's installed by default when the system wants a D 
compiler (e.g. if you try do install Dub by itself).

- The garbage collector is still a major hindrance for system 
programming. In this case it was due to the ioctls used being 
slow, and when the GC tries to stop the world to do its thing, it 
just hangs the entire program until ALL ioctls in all threads 
complete. This means it wasn't possible to have a stutter-free 
interactive UI, so I had to move processing to subprocesses.

- One user wondered why the program needed so many threads. The 
answer was that half of them were owned by the GC (it never stops 
its worker threads, they just sit idle).

- I used the Deimos ncurses bindings package. I'm thankful that 
it already existed, though I had to push some fixes to fix static 
linking. The most annoying part was waiting overnight for 
code.dlang.org to pick up the new tags, because there is no way 
to get it to update a package unless you're the owner, and no way 
to otherwise specify a dependency unless using a branch (which is 
deprecated and prints a big warning when your users build your 
program).

- Nice D features that came in useful: reflection to generate a 
lightweight serializer/deserializer for subprocess communication; 
strings as slices to allow processing them without copying them 
out of the network buffer; and template mixins to add common 
behavior to types without runtime polymorphism.

Nov 08 2020

user1234 <user1234 12.de> writes:

On Sunday, 8 November 2020 at 17:23:32 UTC, Vladimir Panteleev 
wrote:
 https://blog.cy.md/2020/11/08/btdu-sampling-disk-usage-profiler-for-btrfs/

 https://github.com/CyberShadow/btdu

 D-related thoughts:

 - D programs that build fine on one Linux machine may still 
 fail to build with mysterious linking errors on another, even 
 when using Dub which takes care of dependency management. I saw 
 two counts of this, caused by differences in DMD/LDC and 
 Arch/Debian (one being that, for whatever reason, libz is not 
 pulled in on LDC/Debian despite being a Phobos dependency). 
 Also, LDC is the D compiler that's installed by default when 
 the system wants a D compiler (e.g. if you try do install Dub 
 by itself).

 - The garbage collector is still a major hindrance for system 
 programming. In this case it was due to the ioctls used being 
 slow, and when the GC tries to stop the world to do its thing, 
 it just hangs the entire program until ALL ioctls in all 
 threads complete. This means it wasn't possible to have a 
 stutter-free interactive UI, so I had to move processing to 
 subprocesses.

 - One user wondered why the program needed so many threads. The 
 answer was that half of them were owned by the GC (it never 
 stops its worker threads, they just sit idle).

 - I used the Deimos ncurses bindings package. I'm thankful that 
 it already existed, though I had to push some fixes to fix 
 static linking. The most annoying part was waiting overnight 
 for code.dlang.org to pick up the new tags, because there is no 
 way to get it to update a package unless you're the owner, and 
 no way to otherwise specify a dependency unless using a branch 
 (which is deprecated and prints a big warning when your users 
 build your program).

 - Nice D features that came in useful: reflection to generate a 
 lightweight serializer/deserializer for subprocess 
 communication; strings as slices to allow processing them 
 without copying them out of the network buffer; and template 
 mixins to add common behavior to types without runtime 
 polymorphism.

I like the report about how D was efficienet to develop this 
tool, otherwise
what do you use it for ? What is the typical usage of such tools ?

Nov 09 2020

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Monday, 9 November 2020 at 12:21:55 UTC, user1234 wrote:
 I like the report about how D was efficienet to develop this 
 tool, otherwise
 what do you use it for ? What is the typical usage of such 
 tools ?

Well, the README and linked blog post answer that to some extent, 
but my personal use cases are actually tangential to D, so I can 
write more about that here.

I've been using btrfs on my home system ever since switching to 
Linux full-time, and a few years ago I switched over the server 
(hosting this forum / the wiki / some other services) to it too. 
This allowed us to have incremental, atomic, hourly, off-site 
backups, which actually saved our butts big-time when the hosting 
provider decided to shut off the server over a clerical issue in 
the distant year of 2019. Some snapshots are also retained for a 
while to allow rollbacks or undelete files in case I fat-finger 
something during maintenance.

One of btrfs's boons is that across subvolumes and clones, 
deduplication allows reusing the same unique block across many 
files and snapshots, which saves space but also what enables 
atomic snapshots to work (with successive writes being COW). If 
you add compression on top of that, it can be challenging to 
understand what is actually using how much space, and since 
storage costs are not insignificant on a FOSS budget, it does 
need to be managed, and I was missing a tool that would help do 
this. Another unique benefit of btdu is that it starts displaying 
results almost instantly, which is great when the disk is full 
causing everything to be on fire and you need to free up some 
disk space right now.

Nov 09 2020

user1234 <user1234 12.de> writes:

On Monday, 9 November 2020 at 12:52:12 UTC, Vladimir Panteleev 
wrote:
 On Monday, 9 November 2020 at 12:21:55 UTC, user1234 wrote:
 I like the report about how D was efficienet to develop this 
 tool, otherwise
 what do you use it for ? What is the typical usage of such 
 tools ?

 Well, the README and linked blog post answer that to some 
 extent, but my personal use cases are actually tangential to D, 
 so I can write more about that here.

 I've been using btrfs on my home system ever since switching to 
 Linux full-time, and a few years ago I switched over the server 
 (hosting this forum / the wiki / some other services) to it 
 too. This allowed us to have incremental, atomic, hourly, 
 off-site backups, which actually saved our butts big-time when 
 the hosting provider decided to shut off the server over a 
 clerical issue in the distant year of 2019. Some snapshots are 
 also retained for a while to allow rollbacks or undelete files 
 in case I fat-finger something during maintenance.

 One of btrfs's boons is that across subvolumes and clones, 
 deduplication allows reusing the same unique block across many 
 files and snapshots, which saves space but also what enables 
 atomic snapshots to work (with successive writes being COW). If 
 you add compression on top of that, it can be challenging to 
 understand what is actually using how much space, and since 
 storage costs are not insignificant on a FOSS budget, it does 
 need to be managed, and I was missing a tool that would help do 
 this. Another unique benefit of btdu is that it starts 
 displaying results almost instantly, which is great when the 
 disk is full causing everything to be on fire and you need to 
 free up some disk space right now.

Allright it's clearer now, thanks for the clarifications ;)

Nov 09 2020

matheus <matheus gmail.com> writes:

On Sunday, 8 November 2020 at 17:23:32 UTC, Vladimir Panteleev 
wrote:
 ...
 - The garbage collector is still a major hindrance for system 
 programming. In this case it was due to the ioctls used being 
 slow, and when the GC tries to stop the world to do its thing, 
 it just hangs the entire program until ALL ioctls in all 
 threads complete. This means it wasn't possible to have a 
 stutter-free interactive UI, so I had to move processing to 
 subprocesses.
 ...

I read about GC issues like this very often and my question is: 
Can't GC be set just to run without collecting anything, and 
manually set it to collect after a process is finished?

Matheus.

Nov 09 2020

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Monday, 9 November 2020 at 13:33:50 UTC, matheus wrote:
 I read about GC issues like this very often and my question is: 
 Can't GC be set just to run without collecting anything, and 
 manually set it to collect after a process is finished?

You can disable the GC and you can run it manually, but this 
wouldn't help in this case, because the ioctls are run across 
threads in an overlapping way. It would be possible if the 
program was designed such that every once in a while, the main 
thread tells all worker threads "OK, let's do a GC so nobody 
start any new ioctls for now", and when the last ioctl finishes 
run the GC and then let worker threads start ioctls again, but 
this means that up to all but one worker threads are idle and 
waiting for the last ioctl to finish. ioctl duration varies from 
milliseconds to seconds in this case, so it would noticeably 
affect throughput.

Nov 09 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 11/9/20 8:41 AM, Vladimir Panteleev wrote:
 On Monday, 9 November 2020 at 13:33:50 UTC, matheus wrote:
 I read about GC issues like this very often and my question is: Can't 
 GC be set just to run without collecting anything, and manually set it 
 to collect after a process is finished?

 
 You can disable the GC and you can run it manually, but this wouldn't 
 help in this case, because the ioctls are run across threads in an 
 overlapping way. It would be possible if the program was designed such 
 that every once in a while, the main thread tells all worker threads 
 "OK, let's do a GC so nobody start any new ioctls for now", and when the 
 last ioctl finishes run the GC and then let worker threads start ioctls 
 again, but this means that up to all but one worker threads are idle and 
 waiting for the last ioctl to finish. ioctl duration varies from 
 milliseconds to seconds in this case, so it would noticeably affect 
 throughput.
 

It would still help I think, because for instance, the UI is probably 
not running ioctls, and so it wouldn't pause while you are waiting for 
the ioctle-running threads to finish.

-Steve

Nov 09 2020

Jacob Carlborg <doob me.com> writes:

On Sunday, 8 November 2020 at 17:23:32 UTC, Vladimir Panteleev 
wrote:

 - D programs that build fine on one Linux machine may still 
 fail to build with mysterious linking errors on another, even 
 when using Dub which takes care of dependency management. I saw 
 two counts of this, caused by differences in DMD/LDC and 
 Arch/Debian (one being that, for whatever reason, libz is not 
 pulled in on LDC/Debian despite being a Phobos dependency). 
 Also, LDC is the D compiler that's installed by default when 
 the system wants a D compiler (e.g. if you try do install Dub 
 by itself).

I don't think this is specific to D. I've seen in the past 
problems caused by package maintainers not building the package 
in the same way as upstream. Or they split up a package in 
multiple packages.

 - The garbage collector is still a major hindrance for system 
 programming. In this case it was due to the ioctls used being 
 slow, and when the GC tries to stop the world to do its thing, 
 it just hangs the entire program until ALL ioctls in all 
 threads complete.

You should probably never let the GC run on a realtime thread, 
like audio or video processing (not sure if ioctls falls into 
this category). These days, modern UIs should probably fall into 
the realtime category.

 This means it wasn't possible to have a stutter-free 
 interactive UI, so I had to move processing to subprocesses.

I'm not sure if it's possible to ever have a completely 
stutter-free UI with a stop-the-world GC.

 - One user wondered why the program needed so many threads. The 
 answer was that half of them were owned by the GC (it never 
 stops its worker threads, they just sit idle).

Is that the answer? I mean, the GC doesn't create any threads by 
itself, does it?

 - I used the Deimos ncurses bindings package. I'm thankful that 
 it already existed, though I had to push some fixes to fix 
 static linking. The most annoying part was waiting overnight 
 for code.dlang.org to pick up the new tags, because there is no 
 way to get it to update a package unless you're the owner, and 
 no way to otherwise specify a dependency unless using a branch 
 (which is deprecated and prints a big warning when your users 
 build your program).

Since 2.094.0, you can specify a Git repository as a dependency 
[1]. You can also specify a local path as a dependency [2], 
useful when developing a library and an application at the same 
time, as two separate Dub packages.

[1] https://dlang.org/changelog/2.094.0.html#git-paths
[2] https://dub.pm/package-format-sdl.html#version-specs

Nov 10 2020

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Tuesday, 10 November 2020 at 09:40:33 UTC, Jacob Carlborg 
wrote:
 On Sunday, 8 November 2020 at 17:23:32 UTC, Vladimir Panteleev 
 wrote:

 - D programs that build fine on one Linux machine may still 
 fail to build with mysterious linking errors on another, even 
 when using Dub which takes care of dependency management. I 
 saw two counts of this, caused by differences in DMD/LDC and 
 Arch/Debian (one being that, for whatever reason, libz is not 
 pulled in on LDC/Debian despite being a Phobos dependency). 
 Also, LDC is the D compiler that's installed by default when 
 the system wants a D compiler (e.g. if you try do install Dub 
 by itself).

 I don't think this is specific to D. I've seen in the past 
 problems caused by package maintainers not building the package 
 in the same way as upstream. Or they split up a package in 
 multiple packages.

I think it might be less of a problem in e.g. Go.

 - The garbage collector is still a major hindrance for system 
 programming. In this case it was due to the ioctls used being 
 slow, and when the GC tries to stop the world to do its thing, 
 it just hangs the entire program until ALL ioctls in all 
 threads complete.

 You should probably never let the GC run on a realtime thread, 
 like audio or video processing (not sure if ioctls falls into 
 this category). These days, modern UIs should probably fall 
 into the realtime category.

Doing UI without GC in D would be pretty painful.

But, by itself the GC doesn't add much latency to introduce 
stutter in the UI - a GC scan is generally quick enough that the 
UI doesn't feel laggy or stuttery. The problem is that the GC is 
waiting for all threads to finish their ioctls, while the program 
otherwise is completely suspended. This affects not just UI, but 
throughput.

 - One user wondered why the program needed so many threads. 
 The answer was that half of them were owned by the GC (it 
 never stops its worker threads, they just sit idle).

 Is that the answer? I mean, the GC doesn't create any threads 
 by itself, does it?

Yes, it does, since the introduction of parallel heap scanning in 
2.087:

https://dlang.org/changelog/2.087.0.html#gc_parallel

 - I used the Deimos ncurses bindings package. I'm thankful 
 that it already existed, though I had to push some fixes to 
 fix static linking. The most annoying part was waiting 
 overnight for code.dlang.org to pick up the new tags, because 
 there is no way to get it to update a package unless you're 
 the owner, and no way to otherwise specify a dependency unless 
 using a branch (which is deprecated and prints a big warning 
 when your users build your program).

 Since 2.094.0, you can specify a Git repository as a dependency 
 [1]. You can also specify a local path as a dependency [2], 
 useful when developing a library and an application at the same 
 time, as two separate Dub packages.

 [1] https://dlang.org/changelog/2.094.0.html#git-paths
 [2] https://dub.pm/package-format-sdl.html#version-specs

This is super useful. Thanks.

Nov 10 2020

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 10 November 2020 at 10:42:09 UTC, Vladimir Panteleev 
wrote:
 But, by itself the GC doesn't add much latency to introduce 
 stutter in the UI - a GC scan is generally quick enough that 
 the UI doesn't feel laggy or stuttery. The problem is that the 
 GC is waiting for all threads to finish their ioctls, while the 
 program otherwise is completely suspended. This affects not 
 just UI, but throughput.

Would a thread local GC with reference counted shared objects 
work for your use case?

Nov 10 2020

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Tuesday, 10 November 2020 at 13:55:52 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 10 November 2020 at 10:42:09 UTC, Vladimir 
 Panteleev wrote:
 But, by itself the GC doesn't add much latency to introduce 
 stutter in the UI - a GC scan is generally quick enough that 
 the UI doesn't feel laggy or stuttery. The problem is that the 
 GC is waiting for all threads to finish their ioctls, while 
 the program otherwise is completely suspended. This affects 
 not just UI, but throughput.

 Would a thread local GC with reference counted shared objects 
 work for your use case?

I don't think there is a simple answer here.

Removing the global GC lock for allocations, and allowing each 
thread to allocate from its own private pool, would greatly 
improve the performance of multi-threaded applications. For 
example, the global GC lock was what was preventing moving more 
processing in Dustmite to worker threads - currently, it's often 
better to keep everything in one thread for GC-dependent code 
instead of using worker threads specifically because of the 
overhead of the global GC lock. I think such a modification would 
be possible without radical changes to the language or GC design, 
but it's possible I'm missing something.

However, that wouldn't help in this case, because the problem 
here doesn't come from allocations, but from the stop-the-world 
aspect of the GC.

A theoretical non-stop-the-world GC would indeed help in this 
situation, but such a GC is only possible if you restrict the 
language to a subset, such that all copies of managed objects are 
always visible to the compiler. It would require all  system / 
extern(C) code to be carefully re-scrutinized. In short, this 
would essentially be a different language (based on D). I don't 
think we can get there from where we are now.

Nov 27 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Friday, 27 November 2020 at 10:20:41 UTC, Vladimir Panteleev 
wrote:
 However, that wouldn't help in this case, because the problem 
 here doesn't come from allocations, but from the stop-the-world 
 aspect of the GC.

 A theoretical non-stop-the-world GC would indeed help in this 
 situation, but such a GC is only possible if you restrict the 
 language to a subset, such that all copies of managed objects 
 are always visible to the compiler. It would require all 
  system / extern(C) code to be carefully re-scrutinized. In 
 short, this would essentially be a different language (based on 
 D). I don't think we can get there from where we are now.

Hm, but it would only stop a single thread. You would not be 
allowed to share nonpinned objects with other threads.

Nov 27 2020

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Friday, 27 November 2020 at 10:26:18 UTC, Ola Fosheim Grostad 
wrote:
 On Friday, 27 November 2020 at 10:20:41 UTC, Vladimir Panteleev 
 wrote:
 However, that wouldn't help in this case, because the problem 
 here doesn't come from allocations, but from the 
 stop-the-world aspect of the GC.

 A theoretical non-stop-the-world GC would indeed help in this 
 situation, but such a GC is only possible if you restrict the 
 language to a subset, such that all copies of managed objects 
 are always visible to the compiler. It would require all 
  system / extern(C) code to be carefully re-scrutinized. In 
 short, this would essentially be a different language (based 
 on D). I don't think we can get there from where we are now.

 Hm, but it would only stop a single thread. You would not be 
 allowed to share nonpinned objects with other threads.

Right, so that's another imposed limitation of such a GC. You'd 
still also lose the ability to memcpy or memset a struct that had 
managed pointers, as that would break the reference count that 
the GC relies on to work. It would definitely solve the 
performance problem, but it would be such a radical change that 
it would essentially be a different language (and debatedly no 
longer a system-programming one).

Nov 27 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Friday, 27 November 2020 at 10:31:21 UTC, Vladimir Panteleev 
wrote:
 Right, so that's another imposed limitation of such a GC. You'd 
 still also lose the ability to memcpy or memset a struct that 
 had managed pointers, as that would break the reference count 
 that the GC relies on to work. It would definitely solve the 
 performance problem, but it would be such a radical change that 
 it would essentially be a different language (and debatedly no 
 longer a system-programming one).

I think it is no different than shared_ptr. I also think one can 
add some safety through global pointer analysis for existing 
code.  Let the pinning be done by a counter, when you pin the 
object you get a smartpointer borrowed_ptr... when the count goes 
to zero, the object is local again.

Nov 27 2020

IGotD- <nise nise.com> writes:

On Friday, 27 November 2020 at 10:41:54 UTC, Ola Fosheim Grostad 
wrote:
 I think it is no different than shared_ptr. I also think one 
 can add some safety through global pointer analysis for 
 existing code.  Let the pinning be done by a counter, when you 
 pin the object you get a smartpointer borrowed_ptr... when the 
 count goes to zero, the object is local again.

Reference counting which also means multiple ownership doesn't 
play well well with any borrowing mechanism. Reason is that the 
compiler cannot determine the borrow checker at compile time and 
must insert runtime checks if you are allowed to borrow or not. 
This reduces the performance, probably not a lot but still. Let's 
leave borrow checker outside D and just have good old reference 
counting, that's what we need.

Speaking of parallel GC, even if we have atomic reference 
counting or other parallel method, the underlying malloc/free 
must also be non blocking or at least reduce the locking as much 
as possible. Many libc implementations have this already though.

Nov 27 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Friday, 27 November 2020 at 11:31:48 UTC, IGotD- wrote:
 Reference counting which also means multiple ownership doesn't 
 play well well with any borrowing mechanism. Reason is that the 
 compiler cannot determine the borrow checker at compile time 
 and must insert runtime checks if you are allowed to borrow or 
 not. This reduces the performance, probably not a lot but 
 still. Let's leave borrow checker outside D and just have good 
 old reference counting, that's what we need.

You can view ARC as a borrowchecker. If the ARC optimizer 
succeeds globally then all acquire/release can be omitted for 
that type (or that call graph path).

The problem is interior pointers which would require fat pointers 
or borrowchecker...

But again you could rewrite those fat pointers if ARC 
optimization is highly successful (if the code validates like it 
would for a borrow checker)

Nov 27 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Friday, 27 November 2020 at 12:00:40 UTC, Ola Fosheim Grostad 
wrote:
 You can view ARC as a borrowchecker. If the ARC optimizer 
 succeeds globally then all acquire/release can be omitted for 
 that type (or that call graph path).

 The problem is interior pointers which would require fat 
 pointers or borrowchecker...

 But again you could rewrite those fat pointers if ARC 
 optimization is highly successful (if the code validates like 
 it would for a borrow

Sadly, templated types that depend on struct size and field 
offsets could be a problem for such rewrites... So the compiler 
would have to annotate structs with dependencies...

Nov 27 2020

D Programming

C/C++ Programming

Other

digitalmars.D - btdu - a sampling disk usage profiler for btrfs (written in D)