Question

Multi-Fasta Sequence Input C++ Library

2

Entering edit mode

12.2 years ago

Lexxx233 ▴ 20

Hi, I am new to bioinformatics and I was wondering if there is any C++ library for efficiently reading and indexing multi-fasta file?

fasta library • 6.3k views

ADD COMMENT • link updated 12.2 years ago by Pierre Lindenbaum 164k • written 12.2 years ago by Lexxx233 ▴ 20

0

Entering edit mode

What do you mean by "indexing" ? Do you mean what "samtools faidx" does ?

ADD REPLY • link 12.2 years ago by Gabriel R. ★ 2.9k

score 2 · Answer 1 · 2012-08-24

2

Entering edit mode

12.2 years ago

Ido Tamir 5.2k

"efficiently .. indexing" is a bit vague, but I guess you will find nice things in seqAn: http://www.seqan.de/

indexing by sequence name and position (like samtools fai): http://trac.seqan.de/wiki/Tutorial/IndexedFastaIO
other indices (suffix tree...): http://trac.seqan.de/wiki/Tutorial/Indices

ADD COMMENT • link 12.2 years ago by Ido Tamir 5.2k

score 1 · Answer 2 · 2012-08-24

I've got something like this in my sources. See:

for indexing the sequence, you can use a key/value datastore like leveldb , berkeleydb,.... or a embedded sql database like sqlite etc...