Question: Why MSA in MOTHUR reference sequences?
gravatar for Lynxoid
2.5 years ago by
Pleasanton, CA
Lynxoid210 wrote:

I am using Mothur to analyse a set of 16s rRNA sequences. Mothur takes about 5 hours on my modest dataset with 12K sequences (they are 1.5kbp long, but that's not the culprit). I looked at the logs and mothur spends the most time (35-45min) reading the reference 16s sequences and aligning to them (2h for 6K sequences, 8min for 200 seqs). I am using an RDP references which turns out to be 69Gb. Peeking into the reference fasta file, I see that these are not just sequences, it looks like a multiple sequence alignment of all of them. If I remove the gaps from the reference seqs, the file size drops to 3.6Gb. What is the reason for storing MSA of the reference 16s rRNAs? Can I just use the de-gapped reference sequences to align my reads?

sequencing alignment • 828 views
ADD COMMENTlink modified 23 months ago by Biostar ♦♦ 20 • written 2.5 years ago by Lynxoid210
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1584 users visited in the last hour