Aligning multiple overlapping DNA sequence reads to predicted sequence not working
0
0
Entering edit mode
3.8 years ago

Hi All,

I work on a large protein (5 subunits, 5850 bp) that I have to sequence using multiple Sanger Sequence reactions (usually ~10). I want to be able to get the sequencing results and align them to my protein sequence so I can see the whole sequencing result and identify any gaps where I don't have coverage. However, while I can align each sequencing reaction result individually to the template no problem, once I start aligning multiple sequences to the template at once (>5), some no longer align properly and end up getting spread over the whole sequence length with lots of gaps.

My question is what alignment strategy/algorithm should I use to avoid this happening? At the moment I am using mafft --globalpair from within the AliView sequence viewer app. I was using Muscle but this gave even worse results (started spreading sequencing reads over the whole sequence length with less sequences present that mafft).

Some ideas I've had. I assume that as more sequencing reactions are being introduced, the algorithm starts aligning to them rather than my protein sequence in some way. Can I make the alignment program give most weight to my protein sequence during the alignment? Alternatively am I better off performing pairwise alignments first and then combining these into a global alignment later. If so what is the best program to do the combining?

Thanks in advance! Any help appreciated.

sequence-alignment sequencing • 522 views
ADD COMMENT

Login before adding your answer.

Traffic: 2539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6