digitalmars.D.announce - D for BigData: the first BetterC library by Tamediadigital

test123 (8/23) May 01 2022 https://forum.dlang.org/post/hngfeheyklalzoxkyuwq@forum.dlang.org

Cym13 (12/40) May 01 2022 No, that's not what this is for. Hyperloglog is useful if you

test123 (4/15) May 02 2022 Thanks for quick anwser.

test123 <test123 gmail.com> writes:

https://forum.dlang.org/post/hngfeheyklalzoxkyuwq forum.dlang.org

On Saturday, 25 February 2017 at 14:32:00 UTC, Ilya Yaroshenko 
wrote:
 HyperLogLog++ is advanced cardinality estimation algorithm with 
 normal and compressed sparse representations. It can be used to 
 estimate approximate number of unique elements in an unordered 
 set.

 hll-d [1, 2] is written in D. It can be used as betterC library 
 without linking with DRuntime. hll-d has C header and C example.

 Its implementation is based on Mir Algorithm [3]
   1. mir.ndslice.topology.bitpack is used for arrays composed 
 of packed 6bit integers
   2. mir.ndslice.sorting.sort is used for betterC sorting.

 [1] Git: https://github.com/tamediadigital/hll-d
 [2] Dub: http://code.dlang.org/packages/hll-d
 [3] Mir Algorithm: https://github.com/libmir/mir-algorithm

 Best regards,
 Ilya

Thanks for the great work.

I check the c api, can not figure out how to get the count number 
for one element.


For example if I use it as IP counter, is there a way to know how 
much count for one IP has been add into set ?

May 01 2022

Cym13 <cpicard purrfect.fr> writes:

On Monday, 2 May 2022 at 05:22:07 UTC, test123 wrote:
 https://forum.dlang.org/post/hngfeheyklalzoxkyuwq forum.dlang.org

 On Saturday, 25 February 2017 at 14:32:00 UTC, Ilya Yaroshenko 
 wrote:
 HyperLogLog++ is advanced cardinality estimation algorithm 
 with normal and compressed sparse representations. It can be 
 used to estimate approximate number of unique elements in an 
 unordered set.

 hll-d [1, 2] is written in D. It can be used as betterC 
 library without linking with DRuntime. hll-d has C header and 
 C example.

 Its implementation is based on Mir Algorithm [3]
   1. mir.ndslice.topology.bitpack is used for arrays composed 
 of packed 6bit integers
   2. mir.ndslice.sorting.sort is used for betterC sorting.

 [1] Git: https://github.com/tamediadigital/hll-d
 [2] Dub: http://code.dlang.org/packages/hll-d
 [3] Mir Algorithm: https://github.com/libmir/mir-algorithm

 Best regards,
 Ilya

 Thanks for the great work.

 I check the c api, can not figure out how to get the count 
 number for one element.


 For example if I use it as IP counter, is there a way to know 
 how much count for one IP has been add into set ?

No, that's not what this is for. Hyperloglog is useful if you 
have a big dataset that may contain duplicates and you want to 
know how many unique items you have (with a reasonnable 
probability). For example, as a website, this can be used to 
estimate how many visitors you have without having to store every 
single IP address to check for duplicates at new connections. The 
tradeoff is that it's probabilistic: you don't need to store 
every address so you need much less space and time to get a count 
of unique ips, but you have to accept a margin of error on that 
result and you can't know what the IPs were in the first place, 
just how many of them there are.

May 01 2022

test123 <test123 gmail.com> writes:

On Monday, 2 May 2022 at 06:17:17 UTC, Cym13 wrote:
 No, that's not what this is for. Hyperloglog is useful if you 
 have a big dataset that may contain duplicates and you want to 
 know how many unique items you have (with a reasonnable 
 probability). For example, as a website, this can be used to 
 estimate how many visitors you have without having to store 
 every single IP address to check for duplicates at new 
 connections. The tradeoff is that it's probabilistic: you don't 
 need to store every address so you need much less space and 
 time to get a count of unique ips, but you have to accept a 
 margin of error on that result and you can't know what the IPs 
 were in the first place, just how many of them there are.

Thanks for quick anwser.

You mean with Hyperloglog, I can not get each IP count but only 
the value how much IP has beed add into set ?

May 02 2022

D Programming

C/C++ Programming

Other

digitalmars.D.announce - D for BigData: the first BetterC library by Tamediadigital