Which C++ Libraries Are Best For Dealing With Fastq Files?
5
9
Entering edit mode
11.6 years ago

I would like to rewrite some perl scripts into something faster. I haven't written C++ since the Clinton administration. Granted I am not married to C++ per se but I would need something that benchmarks well.

Which C++ libraries are people using to deal with NGS data?

c fastq next-gen sequencing • 8.7k views
0
Entering edit mode

Which OS/CPU architecture do you need it for?

0
Entering edit mode

RHEL5 x86_64 ..

10
Entering edit mode
11.6 years ago

I saw a SeqAn poster at ISMB last year. No experience with the library (nor C++) myself but they support the fastq format and they made the impression that they are quite competent.

SeqAn file formats

8
Entering edit mode
11.6 years ago
User 59 13k

I wouldn't dream of doing this I admit, I tend to handle fastq files with applications other people develop.

However there is a FASTA/FASTQ c++ parser here:

http://lh3lh3.users.sourceforge.net/parsefastq.shtml which might serve as a base for what you want to do.

It's from Heng Li who also works on SAMtools, BWA and MAQ

2
Entering edit mode

bouuhhhh in http://lh3lh3.users.sourceforge.net/kseq.shtml ANY malloc should be checked against NULL (line 56 , 121 , 188 ...) :-(

1
Entering edit mode

+1 - Highly recommended. That header supports compressed files, too, which speeds IO-bound processing. One caveat for C++ though - that header wants char and FILE, not C++ strings and iostreams, but that's easy enough to manage.

3
Entering edit mode
11.0 years ago
Manuel ▴ 400

If you work with NGS data and want to try SeqAn as Michael already suggested, have a look at this tutorial for importing read data. Also, their documentation has greatly improved recently.

0
Entering edit mode
7.0 years ago
Luiz Irber • 0

Another option is SeqDB

0
Entering edit mode
3.2 years ago
cartoonist ▴ 80

Since I could not find any C++ library that meets my requirements, I re-write the kseq library (by @lh3) in C++ using templates, called kseq++. Here, I compared its performance with original kseq and SeqAn:

https://github.com/cartoonist/kseqpp

SeqAn uses its own string class. If one does not use it, converting back to std::string is really expensive (3x slower on my workstation).