Which C++ Libraries Are Best For Dealing With Fastq Files?
5
9
Entering edit mode
11.6 years ago

I would like to rewrite some perl scripts into something faster. I haven't written C++ since the Clinton administration. Granted I am not married to C++ per se but I would need something that benchmarks well.

Which C++ libraries are people using to deal with NGS data?

c fastq next-gen sequencing • 8.7k views
ADD COMMENT
0
Entering edit mode

Which OS/CPU architecture do you need it for?

ADD REPLY
0
Entering edit mode

RHEL5 x86_64 ..

ADD REPLY
10
Entering edit mode
11.6 years ago

I saw a SeqAn poster at ISMB last year. No experience with the library (nor C++) myself but they support the fastq format and they made the impression that they are quite competent.

SeqAn file formats

ADD COMMENT
8
Entering edit mode
11.6 years ago
User 59 13k

I wouldn't dream of doing this I admit, I tend to handle fastq files with applications other people develop.

However there is a FASTA/FASTQ c++ parser here:

http://lh3lh3.users.sourceforge.net/parsefastq.shtml which might serve as a base for what you want to do.

It's from Heng Li who also works on SAMtools, BWA and MAQ

ADD COMMENT
2
Entering edit mode

bouuhhhh in http://lh3lh3.users.sourceforge.net/kseq.shtml ANY malloc should be checked against NULL (line 56 , 121 , 188 ...) :-(

ADD REPLY
1
Entering edit mode

+1 - Highly recommended. That header supports compressed files, too, which speeds IO-bound processing. One caveat for C++ though - that header wants char and FILE, not C++ strings and iostreams, but that's easy enough to manage.

ADD REPLY
3
Entering edit mode
11.1 years ago
Manuel ▴ 400

If you work with NGS data and want to try SeqAn as Michael already suggested, have a look at this tutorial for importing read data. Also, their documentation has greatly improved recently.

ADD COMMENT
0
Entering edit mode
7.0 years ago
Luiz Irber • 0

Another option is SeqDB

ADD COMMENT
0
Entering edit mode
3.3 years ago
cartoonist ▴ 90

Since I could not find any C++ library that meets my requirements, I re-write the kseq library (by @lh3) in C++ using templates, called kseq++. Here, I compared its performance with original kseq and SeqAn:

https://github.com/cartoonist/kseqpp

SeqAn uses its own string class. If one does not use it, converting back to std::string is really expensive (3x slower on my workstation).

ADD COMMENT

Login before adding your answer.

Traffic: 2465 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6