I am developing a genetics library for boost that performs indexed searches on DNA data.
Is anyone interested in helping me make the submission process? I am from a high performance computing and compiler background and need serious bioinformaticians like yourselves to help guide the design.
https://github.com/andy-thomason/genetics
The library is currently in use as the back end to a CRISPR system.
The library is:
- Readable (very few single letter variables!)
- C++11/14 compliant
- Inline (no linking required)
- Fast (100k matches per sec on typical server)
- Uses memory mapping - instant access to files
To try the library do a recursive clone on the main boost repository and then clone in "libs/genetics". Build using the "b2" tool which is part of boost.
Example use cases:
- Find all 100bp string matches in the rat genome with up to three errors.
- Find all 20bp string matches in the rat genome with up to six errors (this is hard!)
We hope to expand this to give instant access to BAM files without reading and to parse FASTQ files in parallel.
There are python binding examples and a simple BWA-style aligner.