Question

Gene mutations discovery

2

Entering edit mode

9.3 years ago

francesca.defilippis ▴ 20

Hello!

I'm approaching a dna mutation analysis for the first time. I have DNA sequences of a structural gene (454 reads, about 450 bp, single end) sequenced in 10 samples and I need to find mutations and quantify the abundance of each mutant in each sample.

This is how I'm thinking to proceed:

demultiplexing sequences using qiime
trimming low quality bases and filtering reads too short
clustering them at 100% similarity using usearch
aligning centroids to some sequences found at NCBI using mummer

My questions are:

First of all, is my approach correct? Does anybody have experience in this kind of analysis and which approach is used?

Then, I don't know if mummer is the good choice... It is designed for snp discover in genomes, so I don't know if it could be a good idea using it for sequences that I expect to be very similar among them. If not, any other suggestion about software to use?

Thanks

Francesca

mummer SNP mutation • 1.8k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by francesca.defilippis ▴ 20

Ram · Answer 1 · 2015-01-07

0

Entering edit mode

9.3 years ago

Devon Ryan 104k

I'd skip clustering and just align everything. Just use a few more cores. I'd give BWA a try rather than mummer, mostly because BWA typically produces good quality results in downstream variant calling and also because it's reasonably fast.

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Devon Ryan 104k

0

Entering edit mode

Hi! Thanks for your prompt reply! I wondered about using bwa or bowtie, but using them wouldn' t imply losing new variants? I could just quantify the number of sequences matching those I have as references. Is it right?

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by francesca.defilippis ▴ 20

0

Entering edit mode

BWA is probably the most popular aligner used to find new variants, so no it won't lose them (unless the sequence is very highly divergent).

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Devon Ryan 104k