www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - language feature usage statistics

reply aliak <something something.com> writes:
So this is something I've been wondering about for a while. And I 
don't believe I've seen any compiler with this, but have people 
ever thought about putting telemetry in compilers? How do people 
feel about having feature usage stats from dmd?

I can understand that networking from a compiler just sounds bad, 
but, there're other ways around it. e.g. write a file instead and 
ask dev to email it, or ask for permission before turning it on 
and send it, only do it in debug mode, I dunno, just spit balling 
here.

But, having actual usage statistics will take away so many 
assumptions people have about how features are used, how often 
they're used, which features are not used, etc (of course if 
there're no statistics on a feature it doesn't mean it's never 
used). Data like this is very actionable - and is how any 
(probably non-enterprise) product is built these days (even 
vscode for example has an option to send usage stats).

Crash reporting is another thing. When the compiler crashes, that 
can be sent somewhere (again, with user permission).

Things that can be answered:
* which feature is not used and can be cut
* which feature is used the most and should be enhanced, fixed, 
polished
* which combination of features are used together => can they be 
unified?

The next time someone says they don't think lazy is useful, we 
can point to actual data.

And then for example, from the features that are hardly used, we 
can start asking why they are not used. If we know why then 
future features that may contain the same base assumptions that 
led to the creation of the unused features can be avoided.

Figuring out why the features are unused, or hardly used, can 
also better enable us to make the feature usable.

These kind of stats can also be collected on any symbols that are 
loaded from std for eg, and then we can also get a feel for which 
functions and modules are used from phobos.

Anyway, I'm not sure about others, but if it'd make D a better 
language than the competition, I'd gladly trust dmd to send stats 
to a place the d language foundation controls.

Cheers
- ali
Oct 18 2019
next sibling parent Les De Ridder <les lesderid.net> writes:
On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:
 So this is something I've been wondering about for a while. And 
 I don't believe I've seen any compiler with this, but have 
 people ever thought about putting telemetry in compilers? How 
 do people feel about having feature usage stats from dmd?

 [...]
I vaguely remember there being a tool that generated such statistics from the source code of packages registered on code.dlang.org, but I might be mistaken.
Oct 18 2019
prev sibling next sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
I've actually considered doing this with dpldocs.info before. It 
would only hit public code but... I have copies of basically the 
whole dub repo and code that is already custom parsing it so I 
could possibly pull info like this when it does its updates.

Though my parser doesn't always keep up with new features (I 
often skip function bodies since it isn't super important for 
documentation purposes) it still mostly works.
Oct 18 2019
prev sibling next sibling parent Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:
 So this is something I've been wondering about for a while. And 
 I don't believe I've seen any compiler with this, but have 
 people ever thought about putting telemetry in compilers? How 
 do people feel about having feature usage stats from dmd?

 [...]
Hear, hear! +1
Oct 18 2019
prev sibling next sibling parent reply Dennis <dkorpel gmail.com> writes:
On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:
 But, having actual usage statistics will take away so many 
 assumptions people have about how features are used, how often 
 they're used, which features are not used, etc
Totally. I've started toying with this by cloning all packages on Dub and running libdparse over it. Turns out a shallow clone takes only ~4Gb total, and a deep clone ~7Gb I believe. I've already used it a few times to support my cases. I made this: https://gist.github.com/dkorpel/10cc13d0740c50a8aab30588f392950f For this: https://github.com/dlang/DIPs/blob/9ca12cc89dadc10f2abfb8a98bf4d52ed8679c2a/DIPs/DIP1NNN-DK.md I made this: https://gist.github.com/dkorpel/df2c2f567588bb8ee59e293146e52723 For this: https://github.com/dlang/dmd/pull/10236 These were bodged together, but I plan to make something more general and polished once I allocate some time for it. Building telemetry options in DMD is something I don't plan to do, but if someone else champions that I'd be in favor!
Oct 18 2019
parent reply aliak <something something.com> writes:
On Friday, 18 October 2019 at 20:53:50 UTC, Dennis wrote:
 On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:
 [...]
Totally. I've started toying with this by cloning all packages on Dub and running libdparse over it. Turns out a shallow clone takes only ~4Gb total, and a deep clone ~7Gb I believe. I've already used it a few times to support my cases. I made this: https://gist.github.com/dkorpel/10cc13d0740c50a8aab30588f392950f For this: https://github.com/dlang/DIPs/blob/9ca12cc89dadc10f2abfb8a98bf4d52ed8679c2a/DIPs/DIP1NNN-DK.md I made this: https://gist.github.com/dkorpel/df2c2f567588bb8ee59e293146e52723 For this: https://github.com/dlang/dmd/pull/10236 These were bodged together, but I plan to make something more general and polished once I allocate some time for it. Building telemetry options in DMD is something I don't plan to do, but if someone else champions that I'd be in favor!
That is great! Which APIs did you use to get all d project links? Does dub provide something? And curious, were you rate limited by github (i'm assuming this was the work of a for loop?).
Oct 20 2019
parent Dennis <dkorpel gmail.com> writes:
On Sunday, 20 October 2019 at 21:29:24 UTC, aliak wrote:
 Which APIs did you use to get all d project links? Does dub 
 provide something?
There might be an API, but I simply parsed the html pages. First I get the identifiers of all packages: ``` import std.net.curl; string page = get("http://code.dlang.org/?sort=added&category=&skip=0&limit=2000").idup; string[] result; foreach(m; page.matchAll(regex(`packages/([a-zA-Z0-9_-]+)`))) result ~= m[1]; ``` Then I parse the package pages for the repository link with htmld: ``` import html; // http://code.dlang.org/packages/htmld string getRepo(string packageName) { string page = get("http://code.dlang.org/packages/"~packageName).idup; auto doc = createDocument(page); if (auto p = doc.querySelector("#repository")) { if (auto m = p.html.matchFirst(`href="([^"]+)`)) { return m[1].text; } } } ```
 And curious, were you rate limited by github (i'm assuming this 
 was the work of a for loop?).
I wouldn't have been surprised if I got a timeout for cloning 1600 repositories in succession, but I didn't. (I suppose the same happens when installing your average NPM package, lol)
Oct 20 2019
prev sibling parent reply matheus <matheus gmail.com> writes:
On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:
 ...
 I don't believe I've seen any compiler with this, but have 
 people ever thought about putting telemetry in compilers?
 ...
I pretty sure Visual Studio does this: https://code.visualstudio.com/docs/getstarted/telemetry Matheus.
Oct 20 2019
parent aliak <something something.com> writes:
On Sunday, 20 October 2019 at 22:08:46 UTC, matheus wrote:
 On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:
 ...
 I don't believe I've seen any compiler with this, but have 
 people ever thought about putting telemetry in compilers?
 ...
I pretty sure Visual Studio does this: https://code.visualstudio.com/docs/getstarted/telemetry Matheus.
Aye, VSCode does this (i mentioned it actually in my original post): On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:
 (probably non-enterprise) product is built these days (even 
 vscode for example has an option to send usage stats).
Oct 20 2019