www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - dhtslib v0.12.0 (high-throughput sequencing library)

reply James Blachly <james.blachly gmail.com> writes:
I'm delighted to finally post an official announcement of our package 
for high-throughput sequencing (HTS), also called Next-generation 
sequencing (NGS): `dhtslib`. It's not a very clever name, and we are 
working on a new one. ;)

https://github.com/blachlylab/dhtslib/
https://code.dlang.org/packages/dhtslib

Once upon a time, BioD[1] was fairly active, but I am afraid D is not 
heavily used in bioinformatics and computational biology, especially in 
high-throughput (genome) sequencing applications when compared to its 
peers.[2] However, our group (cancer genomics) has found D an ideal 
language which is easy to pick up for Python programmers and yet retains 
powerful features for C/C++ programmers.

`dhtslib` began as a thin wrapper over the ubiquitous, but very 
low-level and hard to use `htslib` C library 
(https://github.com/samtools/htslib/). We use `dhtslib` extensively in 
both public and private projects for computational biology, and over the 
years it has grown from simply a (huge) set of `extern (C)` definitions 
to a fully featured, RAII-enabled genome sequencing focused 
bioinformatics package. If you are working in this field, or know 
someone open to D who works in this field, I strongly encourage you to 
point them at `dhtslib`!

  * `htslib` namespace with complete bindings to htslib
  * `dhtslib` namespace with high level object-oriented interfaces, many 
using underlying htslib calls for high performance, but via convenient 
and idiomatic D including RAII, Forward ranges, etc.
  * htslib-backed read/write of SAM/BAM/CRAM, VCF/BCF
  * Readers for BED and GFF3/GTF (not part of htslib)
  * FASTQ streamer
  * CIGAR manipulations

The next version, v0.13.0, adds a novel feature "Typesafe Coordinates", 
which I'll post about separately in a moment!

Kind regards

James S Blachly, MD
The Ohio State University

[0] https://github.com/blachlylab/dhtslib/
     https://code.dlang.org/packages/dhtslib
[1] https://github.com/biod/BioD
[2] Here is a contemporary example of D used in high-throughput 
sequencing: DENTIST by Arne Ludwig at Max Planck institute
     https://github.com/a-ludi/dentist -- if you know of more, please 
let me know!
Aug 31
parent reply Johan <j j.nl> writes:
On Wednesday, 1 September 2021 at 05:27:38 UTC, James Blachly 
wrote:
 I'm delighted to finally post an official announcement of our 
 package for high-throughput sequencing (HTS), also called 
 Next-generation sequencing (NGS): `dhtslib`. It's not a very 
 clever name, and we are working on a new one. ;)

 https://github.com/blachlylab/dhtslib/

 [...]

 [2] Here is a contemporary example of D used in high-throughput 
 sequencing: DENTIST by Arne Ludwig at Max Planck institute
     https://github.com/a-ludi/dentist
I am surprised to see the use of DMD (see the Dockerfile). If you want runtime performance, the first thing I would do is switch to LDC or GDC. Perhaps DENTIST's particular use of D and dhtslib is mainly forwarding calls to htslib (C) and thus D performance is not relevant? -Johan
Sep 02
parent James Blachly <james.blachly gmail.com> writes:
On Thursday, 2 September 2021 at 10:32:19 UTC, Johan wrote:
 On Wednesday, 1 September 2021 at 05:27:38 UTC, James Blachly
 [2] Here is a contemporary example of D used in 
 high-throughput sequencing: DENTIST by Arne Ludwig at Max 
 Planck institute
     https://github.com/a-ludi/dentist
I am surprised to see the use of DMD (see the Dockerfile). If you want runtime performance, the first thing I would do is switch to LDC or GDC. Perhaps DENTIST's particular use of D and dhtslib is mainly forwarding calls to htslib (C) and thus D performance is not relevant?
DENTIST is someone else's unrelated project that does not to my knowledge use `dhtslib`.
Sep 02