Question

Tool:Boost genetics library

1

Entering edit mode

8.6 years ago

a.thomason ▴ 10

I am developing a genetics library for boost that performs indexed searches on DNA data.

Is anyone interested in helping me make the submission process? I am from a high performance computing and compiler background and need serious bioinformaticians like yourselves to help guide the design.

https://github.com/andy-thomason/genetics

The library is currently in use as the back end to a CRISPR system.

The library is:

Readable (very few single letter variables!)
C++11/14 compliant
Inline (no linking required)
Fast (100k matches per sec on typical server)
Uses memory mapping - instant access to files

To try the library do a recursive clone on the main boost repository and then clone in "libs/genetics". Build using the "b2" tool which is part of boost.

Example use cases:

Find all 100bp string matches in the rat genome with up to three errors.
Find all 20bp string matches in the rat genome with up to six errors (this is hard!)

We hope to expand this to give instant access to BAM files without reading and to parse FASTQ files in parallel.

There are python binding examples and a simple BWA-style aligner.

Boost • 1.6k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by a.thomason ▴ 10