Why MSA in MOTHUR reference sequences?
Entering edit mode
6.5 years ago
Lynxoid ▴ 230

I am using Mothur to analyse a set of 16s rRNA sequences. Mothur takes about 5 hours on my modest dataset with 12K sequences (they are 1.5kbp long, but that's not the culprit). I looked at the logs and mothur spends the most time (35-45min) reading the reference 16s sequences and aligning to them (2h for 6K sequences, 8min for 200 seqs). I am using an RDP references which turns out to be 69Gb. Peeking into the reference fasta file, I see that these are not just sequences, it looks like a multiple sequence alignment of all of them. If I remove the gaps from the reference seqs, the file size drops to 3.6Gb. What is the reason for storing MSA of the reference 16s rRNAs? Can I just use the de-gapped reference sequences to align my reads?

alignment sequencing • 1.7k views

Login before adding your answer.

Traffic: 2146 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6