digitalmars.D.announce - Command line tool for weighted reservoir sampling
- Jon Degenhardt (14/14) Jan 22 2017 I released a new tool for weighted random sampling of tabular
I released a new tool for weighted random sampling of tabular data files: tsv-sample. It's one of several tools recently added to tsv file toolkit I released last year. These tools are especially useful when data files are larger than is desirable to read entirely into memory in R and similar apps. I'll publish an announcement of broader set of tools updates in the next few weeks. I have some performance benchmarks to finish first. However, weighted reservoir sampling algorithms are interesting, I thought there might be enough interest to warrant a separate announcement. Repo: https://github.com/eBay/tsv-utils-dlang tsv-sample code: https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-sample/src/tsv-sample.d --Jon
Jan 22 2017