Question: Global Alignment Of Short Reads
2
gravatar for Pasta
6.9 years ago by
Pasta1.3k
Switzerland
Pasta1.3k wrote:

Hi,

I have short reads that I would like to align on a bacterial genome. I need to know the exact identity between the reads and the reference genome, even if there are mismatches or gaps near the 5' or 3' ends.

To be sure to make myself clear: on this 20nt sequence (MM in lowercase) I would expect an identity of 16/20= 80%

GaaGCCGTTCTTATAGTaaT

I tried with BWA but for a 36bp read I can only have up to max of 5 mismatches, it doesnt seem to be able to give the identity for more MM/gaps (is there anyway to do that ?)

I am thinking of using a global aligner (free license) that should also work correctly on relatively short reads (36bp). If this aligner could output an easy-to-parse output file that would be X-mas for me.

Also, I tried Exonerate but I cannot figured out how to make it work the way I want (the options --exhaustive --model affine:global do not seem to work ...).

Any help is appreciated.

Thanks

short alignment • 1.9k views
ADD COMMENTlink modified 6.9 years ago by Jeremy Leipzig18k • written 6.9 years ago by Pasta1.3k

You will need a more sensitive tool than bwa, but this comes for the price of increased run time. How many reads do you have? You can nowadays run exact algorithms on up to a few thousand reads, but if it is more than that a heuristic approach is required.

ADD REPLYlink written 6.9 years ago by Michael Dondrup45k

I have 13 million reads....

ADD REPLYlink written 6.9 years ago by Pasta1.3k
2
gravatar for Jeremy Leipzig
6.9 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

Novoalign can do full Needleman-Wunsch. It is quite sensitive as well.

ADD COMMENTlink written 6.9 years ago by Jeremy Leipzig18k

Sorry, I forgot to mention that I am looking for a open-source tool

ADD REPLYlink written 6.9 years ago by Pasta1.3k

(Note, Novoalign is free for academic and non-profit use)

ADD REPLYlink written 6.9 years ago by Jeremy Leipzig18k

Oh, ok I will give it a try then. Thx

ADD REPLYlink written 6.9 years ago by Pasta1.3k

Seconding novoalign. I've ran it on deepseq datasets of 10-50 million reads, it's much slower than bowtie but still finishes within a day or two on a reasonably powerful desktop.

ADD REPLYlink written 6.9 years ago by Weronika300

Seconding novoalign. I've ran it on deepseq datasets of 10-50 million 36bp reads, it's much slower than bowtie but still finishes within a day or two on a reasonably powerful desktop. The "native" novoalign output is pretty easy to parse, and it can also output sam files.

ADD REPLYlink written 6.9 years ago by Weronika300
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1135 users visited in the last hour