www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Command line tool for weighted reservoir sampling

I released a new tool for weighted random sampling of tabular 
data files: tsv-sample. It's one of several tools recently added 
to tsv file toolkit I released last year. These tools are 
especially useful when data files are larger than is desirable to 
read entirely into memory in R and similar apps.

I'll publish an announcement of broader set of tools updates in 
the next few weeks. I have some performance benchmarks to finish 
first. However, weighted reservoir sampling algorithms are 
interesting, I thought there might be enough interest to warrant 
a separate announcement.

Repo: https://github.com/eBay/tsv-utils-dlang
tsv-sample code: 
https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-sample/src/tsv-sample.d

--Jon
Jan 22