Question: Why MSA in MOTHUR reference sequences?
gravatar for Lynxoid
23 months ago by
Pleasanton, CA
Lynxoid210 wrote:

I am using Mothur to analyse a set of 16s rRNA sequences. Mothur takes about 5 hours on my modest dataset with 12K sequences (they are 1.5kbp long, but that's not the culprit). I looked at the logs and mothur spends the most time (35-45min) reading the reference 16s sequences and aligning to them (2h for 6K sequences, 8min for 200 seqs). I am using an RDP references which turns out to be 69Gb. Peeking into the reference fasta file, I see that these are not just sequences, it looks like a multiple sequence alignment of all of them. If I remove the gaps from the reference seqs, the file size drops to 3.6Gb. What is the reason for storing MSA of the reference 16s rRNAs? Can I just use the de-gapped reference sequences to align my reads?

sequencing alignment • 571 views
ADD COMMENTlink modified 15 months ago by Biostar ♦♦ 20 • written 23 months ago by Lynxoid210
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1384 users visited in the last hour