www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - phobos src level stats

reply Bruce Carneal <bcarneal gmail.com> writes:
Below you'll find category line percentages and total line counts 
for the 20 biggest files in phobos.

The "line" counts following the file names are of the 
dscanner/libdparse variety rather than the 'wc' variety.

On a 2.4GhZ zen1, libdparse managed all of phobos in a little 
under 1.5 seconds.  For comparison note that all files were read 
in a little under 10 milliseconds (from file cache).

I really enjoyed my first interaction with libdparse but I'm 
guessing that the maintainers there strongly favor 
clarity/correctness over speed.  One other note, compiling with 
dub's --combined option cut the execution time of the 
ldc2/release exe by about 2X.


total bytes 10918340
empty 18%   comments  9%   docs 17%   utst 32%   src 24% 
range/package.d, 9610
empty 18%   comments  3%   docs 12%   utst 54%   src 13% 
datetime/systime.d, 9351
empty 18%   comments  3%   docs 16%   utst 52%   src 11% 
datetime/date.d, 8496
empty 12%   comments 10%   docs 19%   utst 18%   src 41% 
uni/package.d, 8335
empty 21%   comments  3%   docs 34%   utst 37%   src  6% 
datetime/interval.d, 8215
empty 13%   comments 13%   docs 17%   utst 24%   src 35% math.d, 
7679
empty 16%   comments  8%   docs 10%   utst 36%   src 30% 
format.d, 6920
empty 16%   comments  7%   docs 16%   utst 42%   src 19% 
traits.d, 6868
empty 17%   comments 11%   docs 18%   utst 35%   src 20% 
typecons.d, 6784
empty 16%   comments  4%   docs 19%   utst 32%   src 29% 
string.d, 5442
empty 19%   comments  9%   docs 14%   utst 35%   src 23% 
algorithm/iteration.d, 5235
empty 16%   comments 11%   docs 11%   utst 39%   src 23% conv.d, 
4939
empty 16%   comments  7%   docs 25%   utst 22%   src 30% stdio.d, 
4396
empty 14%   comments  6%   docs 43%   utst  6%   src 31% 
net/curl.d, 4167
empty 16%   comments  8%   docs 21%   utst 29%   src 26% 
algorithm/searching.d, 4074
empty 14%   comments  9%   docs 18%   utst 22%   src 36% 
algorithm/sorting.d, 4073
empty 20%   comments  4%   docs 23%   utst 25%   src 28% file.d, 
4071
empty 17%   comments  8%   docs 16%   utst 38%   src 22% array.d, 
3601
empty 20%   comments  9%   docs 29%   utst 12%   src 30% 
parallelism.d, 3594
empty 20%   comments  4%   docs 14%   utst 43%   src 19% 
bitmanip.d, 3457
Sep 22 2020
parent reply Bruce Carneal <bcarneal gmail.com> writes:
On Tuesday, 22 September 2020 at 20:53:17 UTC, Bruce Carneal 
wrote:
 Below you'll find category line percentages and total line 
 counts for the 20 biggest files in phobos.

 The "line" counts following the file names are of the 
 dscanner/libdparse variety rather than the 'wc' variety.

 On a 2.4GhZ zen1, libdparse managed all of phobos in a little 
 under 1.5 seconds.  For comparison note that all files were 
 read in a little under 10 milliseconds (from file cache).

 I really enjoyed my first interaction with libdparse but I'm 
 guessing that the maintainers there strongly favor 
 clarity/correctness over speed.  One other note, compiling with 
 dub's --combined option cut the execution time of the 
 ldc2/release exe by about 2X.
The empty line numbers seem a little high to me. I may have a bug in the code for that: ulong countEmptyLines(string rawText) nogc nothrow pure safe { ulong empties; lineLoop: foreach (line; lineSplitter(rawText)) { foreach_reverse (ch; line) if (ch != ' ' && ch != '\t') continue lineLoop; ++empties; } return empties; }
Sep 22 2020
parent reply DlangUser38 <DlangUser38 nowhere.se> writes:
On Tuesday, 22 September 2020 at 21:01:17 UTC, Bruce Carneal 
wrote:
 The empty line numbers seem a little high to me.  I may have a 
 bug in the code for that:

 ulong countEmptyLines(string rawText)  nogc nothrow pure  safe
 {
     ulong empties;
     lineLoop: foreach (line; lineSplitter(rawText))
     {
         foreach_reverse (ch; line)
             if (ch != ' ' && ch != '\t')
                 continue lineLoop;
         ++empties;
     }
     return empties;
 }
you can count empty lines using a sliding window of two token over the token range. The difference between the two token position give empty line. string literal and comments require a special processing but otherwise this is quite straightforward to implement.
Sep 22 2020
parent reply Bruce Carneal <bcarneal gmail.com> writes:
On Tuesday, 22 September 2020 at 23:22:42 UTC, DlangUser38 wrote:
 On Tuesday, 22 September 2020 at 21:01:17 UTC, Bruce Carneal 
 wrote:
 The empty line numbers seem a little high to me.  I may have a 
 bug in the code for that:

 ulong countEmptyLines(string rawText)  nogc nothrow pure  safe
 {
     ulong empties;
     lineLoop: foreach (line; lineSplitter(rawText))
     {
         foreach_reverse (ch; line)
             if (ch != ' ' && ch != '\t')
                 continue lineLoop;
         ++empties;
     }
     return empties;
 }
you can count empty lines using a sliding window of two token over the token range. The difference between the two token position give empty line. string literal and comments require a special processing but otherwise this is quite straightforward to implement.
So, a way to stay in "token space" then. Don't see a problem with the above but will note for future apps that I need not drop back to raw text.
Sep 22 2020
parent Bruce Carneal <bcarneal gmail.com> writes:
On Wednesday, 23 September 2020 at 00:48:16 UTC, Bruce Carneal 
wrote:
 On Tuesday, 22 September 2020 at 23:22:42 UTC, DlangUser38 
 wrote:
 On Tuesday, 22 September 2020 at 21:01:17 UTC, Bruce Carneal 
 wrote:
 [...]
you can count empty lines using a sliding window of two token over the token range. The difference between the two token position give empty line. string literal and comments require a special processing but otherwise this is quite straightforward to implement.
So, a way to stay in "token space" then. Don't see a problem with the above but will note for future apps that I need not drop back to raw text.
Ah, the problem would be empty lines within docs or comments that are counted when they've already been accounted for in the 'docs' and 'comments' sections. I'll revise the code.
Sep 22 2020