Align against nt database
2
3
Entering edit mode
7.7 years ago
tucanj ▴ 100

What is the best tool to align millions of 2x101bp illumina reads against the NCBI nt database (100+ GB) using global alignment on a machine with 50GB of ram?

Bowtie2 keeps crashing when I go to build the index, even if I divide into smaller parts. It finds parameters and passes the memory test but then gets "Killed" in the sorting stage. It worked when I divided into pieces that were 5GB large (but indexing took forever).

Idea is to find novel sequences. I also thought about using BBDuk to filter against nt, but not sure I will have the ram.

sequencing nt alignment RNA-Seq • 2.4k views
ADD COMMENT
4
Entering edit mode
7.7 years ago

BBMap can be used to align to nt. Its indexing is very fast compared to Burrows-Wheeler-transform indexed tools.

BBMap uses around 6 bytes of RAM per reference base, or around 3 bytes in low-memory mode (with the "usemodulo" flag). So, you would need to subdivide it appropriately. BBDuk, on the other hand, would be far faster... but it uses ~20 bytes per reference base, so you'd need to subdivide it even more.

Edit - you can, however, use BBDuk's "speed" flag to reduce memory usage and increase speed at the expense of sensitivity. Any method you use to align millions of reads against nt is going to be slow, so filtering out as many as possible beforehand is probably prudent. The "speed" flag ignores a fraction of the kmer space; "speed=0" uses all reference kmers; "speed=1" ignores 1/16 of the reference kmers (reducing memory consumption by 1/16); and the maximum "speed=15" reduces memory consumption by 15/16, to a little over 1 byte per reference base. Sensitivity is not affected much for ~150bp reads and genome-size references up to around speed 12 (75% memory reduction).

ADD COMMENT
0
Entering edit mode

Really detailed and helpful response. Thank you!

ADD REPLY
0
Entering edit mode
7.7 years ago

You could use SNAP aligner, which is used in SURPI pipeline. They have done some benchmarking I guess.

ADD COMMENT

Login before adding your answer.

Traffic: 2917 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6