Hi, I am new to bioinformatics and I was wondering if there is any C++ library for efficiently reading and indexing multi-fasta file?
Hi, I am new to bioinformatics and I was wondering if there is any C++ library for efficiently reading and indexing multi-fasta file?
"efficiently .. indexing" is a bit vague, but I guess you will find nice things in seqAn: http://www.seqan.de/
indexing by sequence name and position (like samtools fai): http://trac.seqan.de/wiki/Tutorial/IndexedFastaIO
other indices (suffix tree...): http://trac.seqan.de/wiki/Tutorial/Indices
I've got something like this in my sources. See:
https://code.google.com/p/variationtoolkit/source/browse/trunk/src/fastareader.cpp
https://code.google.com/p/variationtoolkit/source/browse/trunk/src/fastareader.h
for indexing the sequence, you can use a key/value datastore like leveldb , berkeleydb,.... or a embedded sql database like sqlite etc...
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What do you mean by "indexing" ? Do you mean what "samtools faidx" does ?