www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Some (probably inaccurate) stats about Dub packages

reply SealabJaster <sealabjaster gmail.com> writes:
(I know I'm posting a lot lately, but I'm simply curious :D)

To pass the time I've been writing a thingy to show the downloads 
of dub packages over time.

I thought to myself "how many dub packages are actually used?"

So here's a few stats I've compiled. The Weekly and Monthly 
snapshots were taken 1 day ago:

```
TOTAL		PACKAGES		PERCENTAGE
1000000		5			0.24%
100000 		41			2.01%
10000 		165			7.90%
1000 		375			17.96%
100 		1058 			50.67%
10 		1619 			77.54%
1 		2018			96.65%

MONTHLY		PACKAGES		PERCENTAGE
10000 		13 			0.63%
1000 		48 			2.30%
100 		113			5.41%
10 		287 			13.75%
5 		367			17.58%
1 		657 			31.47%

WEEKLY		PACKAGES		PERCENTAGE
1000 		20 			0.98%
100 		70 			3.35%
10 		149 			7.14%
5 		191 			9.15%
1 		309 			14.80%
```

This is just off of data that I've scraped from dub, hence this 
likely isn't too accurate.

But when I look at these numbers I personally think "is it even 
worth wasting time making a dub library?"

I wonder what the average age of the most used/median used/least 
used packages are, and what categories they'd fall under.

Hopefully others find this of interest, even if it's not very 
enlightening on its own.
Oct 21 2021
next sibling parent reply drug <drug2004 bk.ru> writes:
21.10.2021 12:00, SealabJaster пишет:
 
 But when I look at these numbers I personally think "is it even worth 
 wasting time making a dub library?"
Of course it is. It is not so easy to interpret statistics properly, without a proper model it is just numbers having no much sense.
Oct 21 2021
parent SealabJaster <sealabjaster gmail.com> writes:
On Thursday, 21 October 2021 at 09:09:00 UTC, drug wrote:
 Of course it is. It is not so easy to interpret statistics 
 properly, without a proper model it is just numbers having no 
 much sense.
Yeah, you're right. I feel I'm just projecting my own demoralisation, sorry.
Oct 21 2021
prev sibling next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 21 October 2021 at 09:00:09 UTC, SealabJaster wrote:
 (I know I'm posting a lot lately, but I'm simply curious :D)

 To pass the time I've been writing a thingy to show the 
 downloads of dub packages over time.

 [...]
Don't forget. Adding it to dub makes it discoverable in one place. That's a huge advantage in any case.
Oct 21 2021
prev sibling next sibling parent reply SealabJaster <sealabjaster gmail.com> writes:
On Thursday, 21 October 2021 at 09:00:09 UTC, SealabJaster wrote:
 ...
Didn't want to spam the forum with yet another thread, so I'll just post this here since it's at least slightly on topic: "TL;DR I need a search engine for dub packages for a small thing I'm working on, and wanted a side-by-side comparison of Postgres (the database I'm using) and Meilisearch (the dedicated search engine I've been eyeing up)." It's a repo that sets up meilisearch and postgres with package data from dub, in order to see which one gives me better results from queries. Thought someone might find it interesting enough to look at: https://github.com/BradleyChatha/dubsearchtest
Oct 21 2021
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 21 October 2021 at 13:28:50 UTC, SealabJaster wrote:
 On Thursday, 21 October 2021 at 09:00:09 UTC, SealabJaster 
 wrote:
 ...
Didn't want to spam the forum with yet another thread, so I'll just post this here since it's at least slightly on topic: "TL;DR I need a search engine for dub packages for a small thing I'm working on, and wanted a side-by-side comparison of Postgres (the database I'm using) and Meilisearch (the dedicated search engine I've been eyeing up)." It's a repo that sets up meilisearch and postgres with package data from dub, in order to see which one gives me better results from queries. Thought someone might find it interesting enough to look at: https://github.com/BradleyChatha/dubsearchtest
Maybe possible to integrate with https://github.com/dlang/dub-registry/pull/497
Oct 22 2021
parent reply SealabJaster <sealabjaster gmail.com> writes:
On Friday, 22 October 2021 at 07:07:01 UTC, Imperatorn wrote:
 Maybe possible to integrate with 
 https://github.com/dlang/dub-registry/pull/497
I don't have any motivation to work on Dub, so someone else would have to champion something like this through.
Oct 22 2021
parent reply WebFreak001 <d.forum webfreak.org> writes:
On Friday, 22 October 2021 at 11:17:11 UTC, SealabJaster wrote:
 On Friday, 22 October 2021 at 07:07:01 UTC, Imperatorn wrote:
 Maybe possible to integrate with 
 https://github.com/dlang/dub-registry/pull/497
I don't have any motivation to work on Dub, so someone else would have to champion something like this through.
that PR is completely different and removes the internal MongoDB search, replacing it with "packageName.canFind(query)" - that is not the place you would want to extend this search into. Additionally that PR fails to address problems that were brought up in review. It also mixes in the search changes (which I majorly disapprove of as they are right now) with bug fixes and deprecation fixes, which should really get into the code base but I won't merge in as long as they are included with the search changes. As I have already commented in that PR it should really _extend_ the MongoDB text search, not replace it. Make it an aggregate query with a MongoDB $regex match (that makes it a simple string contains, escape regex characters with std.regex) that merges with the text search, giving the regex results higher text scores. If you, the person reading this, are motivated in pushing through a search improvement in DUB, you can check that PR for hints where to start, but you should really create a new PR instead and let MongoDB do the work or introduce some other indexed search framework.
Oct 22 2021
next sibling parent SealabJaster <sealabjaster gmail.com> writes:
On Friday, 22 October 2021 at 11:47:09 UTC, WebFreak001 wrote:
 If you, the person reading this, are motivated in pushing 
 through a search improvement in DUB, you can check that PR for 
 hints where to start, but you should really create a new PR 
 instead and let MongoDB do the work or introduce some other 
 indexed search framework.
Meilisearch has acceptable results, and attempts to fix typos as well. Seems lightweight as well. I kind of liked the results postgres was giving me as well, but I doubt it does anything about typos. And you have to be a bit specific. And of course dub doesn't use postgres >x3 I wanted to add Elasticsearch/Opensearch into the test as well, but couldn't be bothered since it seemed redundant. In other words, if anyone's gonna do this, a dedicated search engine would likely be the way forward?
Oct 22 2021
prev sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 22 October 2021 at 11:47:09 UTC, WebFreak001 wrote:
 On Friday, 22 October 2021 at 11:17:11 UTC, SealabJaster wrote:
 [...]
that PR is completely different and removes the internal MongoDB search, replacing it with "packageName.canFind(query)" - that is not the place you would want to extend this search into. [...]
It was mainly done as the current search is unusable as everyone knows.
Oct 22 2021
parent reply WebFreak001 <d.forum webfreak.org> writes:
On Friday, 22 October 2021 at 12:48:19 UTC, Imperatorn wrote:
 On Friday, 22 October 2021 at 11:47:09 UTC, WebFreak001 wrote:
 On Friday, 22 October 2021 at 11:17:11 UTC, SealabJaster wrote:
 [...]
that PR is completely different and removes the internal MongoDB search, replacing it with "packageName.canFind(query)" - that is not the place you would want to extend this search into. [...]
It was mainly done as the current search is unusable as everyone knows.
I think it's ok for searching for functionality - if you start searching for package names it becomes worse, especially if they are made up words or contain punctuation. Your improvement is great and I would love to put it in, but the implementation is not good as it is right now: - it's getting worse results if you typo or have different tense or form of words - it first fetches all the documents into memory and then filters on them (which will take longer and longer the more packages we have) - the PR is mixed with unrelated deprecation fixes which should really be done separately (but should definitely be done!)
Oct 22 2021
parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 22 October 2021 at 13:43:17 UTC, WebFreak001 wrote:
 On Friday, 22 October 2021 at 12:48:19 UTC, Imperatorn wrote:
 [...]
I think it's ok for searching for functionality - if you start searching for package names it becomes worse, especially if they are made up words or contain punctuation. Your improvement is great and I would love to put it in, but the implementation is not good as it is right now: - it's getting worse results if you typo or have different tense or form of words - it first fetches all the documents into memory and then filters on them (which will take longer and longer the more packages we have) - the PR is mixed with unrelated deprecation fixes which should really be done separately (but should definitely be done!)
Yeah, it's a proof of concept for someone to continue on. More to break the status quo
Oct 22 2021
prev sibling parent reply Abdulhaq <alynch4047 gmail.com> writes:
On Thursday, 21 October 2021 at 09:00:09 UTC, SealabJaster wrote:
 ```
 TOTAL		PACKAGES		PERCENTAGE
 1000000		5			0.24%
 100000 		41			2.01%
 10000 		165			7.90%
 1000 		375			17.96%
 100 		1058 			50.67%
 10 		1619 			77.54%
 1 		2018			96.65%
.
 Hopefully others find this of interest, even if it's not very 
 enlightening on its own.
It's interesting, I suspect that the general profile is very typical of open source projects. In terms of absolute firgures I'm not sure what to make of it. Personally if I had 10 different users of my project I'd consider it a great success, so....
Oct 22 2021
parent reply drug <drug2004 bk.ru> writes:
22.10.2021 14:35, Abdulhaq пишет:
 On Thursday, 21 October 2021 at 09:00:09 UTC, SealabJaster wrote:
 ```
 TOTAL        PACKAGES        PERCENTAGE
 1000000        5            0.24%
 100000         41            2.01%
 10000         165            7.90%
 1000         375            17.96%
 100         1058             50.67%
 10         1619             77.54%
 1         2018            96.65%
.
 Hopefully others find this of interest, even if it's not very 
 enlightening on its own.
It's interesting, I suspect that the general profile is very typical of open source projects. In terms of absolute firgures I'm not sure what to make of it. Personally if I had 10 different users of my project I'd consider it a great success, so....
Me too. One package can have 1000 downloads/month but it can be mechanical dependency performing trivial work. Other package can be downloaded 100 at all but it is directly and actively (with PR/issues) used by different people/teams and performing complex tasks. I believe these figures don't mean anything in fact.
Oct 22 2021
parent Adam D Ruppe <destructionator gmail.com> writes:
On Friday, 22 October 2021 at 11:54:44 UTC, drug wrote:
 Me too. One package can have 1000 downloads/month but it can be 
 mechanical dependency performing trivial work.
Yeah, most the top ones are just dependencies of each other. Also note that "downloads" here is just any time a thing asks for the download url. CI things for certain packages account for 90%+ of these requests. I kinda wanna track the dependency thing, gonna make a local database....
Oct 22 2021