Question

Global Alignment Of Short Reads

2

Entering edit mode

12.1 years ago

Pasta ★ 1.3k

Hi,

I have short reads that I would like to align on a bacterial genome. I need to know the exact identity between the reads and the reference genome, even if there are mismatches or gaps near the 5' or 3' ends.

To be sure to make myself clear: on this 20nt sequence (MM in lowercase) I would expect an identity of 16/20= 80%

GaaGCCGTTCTTATAGTaaT

I tried with BWA but for a 36bp read I can only have up to max of 5 mismatches, it doesnt seem to be able to give the identity for more MM/gaps (is there anyway to do that ?)

I am thinking of using a global aligner (free license) that should also work correctly on relatively short reads (36bp). If this aligner could output an easy-to-parse output file that would be X-mas for me.

Also, I tried Exonerate but I cannot figured out how to make it work the way I want (the options --exhaustive --model affine:global do not seem to work ...).

Any help is appreciated.

Thanks

alignment short • 3.1k views

ADD COMMENT • link updated 12.1 years ago by Jeremy Leipzig 22k • written 12.1 years ago by Pasta ★ 1.3k

0

Entering edit mode

You will need a more sensitive tool than bwa, but this comes for the price of increased run time. How many reads do you have? You can nowadays run exact algorithms on up to a few thousand reads, but if it is more than that a heuristic approach is required.

ADD REPLY • link 12.1 years ago by Michael 54k

0

Entering edit mode

I have 13 million reads....

ADD REPLY • link 12.1 years ago by Pasta ★ 1.3k

score 2 · Answer 1 · 2012-03-20

2

Entering edit mode

12.1 years ago

Jeremy Leipzig 22k

Novoalign can do full Needleman-Wunsch. It is quite sensitive as well.

ADD COMMENT • link 12.1 years ago by Jeremy Leipzig 22k

0

Entering edit mode

Sorry, I forgot to mention that I am looking for a open-source tool

ADD REPLY • link 12.1 years ago by Pasta ★ 1.3k

0

Entering edit mode

(Note, Novoalign is free for academic and non-profit use)

ADD REPLY • link 12.1 years ago by Jeremy Leipzig 22k

0

Entering edit mode

Oh, ok I will give it a try then. Thx

ADD REPLY • link 12.1 years ago by Pasta ★ 1.3k

0

Entering edit mode

Seconding novoalign. I've ran it on deepseq datasets of 10-50 million reads, it's much slower than bowtie but still finishes within a day or two on a reasonably powerful desktop.

ADD REPLY • link 12.1 years ago by Weronika ▴ 300

0

Entering edit mode

Seconding novoalign. I've ran it on deepseq datasets of 10-50 million 36bp reads, it's much slower than bowtie but still finishes within a day or two on a reasonably powerful desktop. The "native" novoalign output is pretty easy to parse, and it can also output sam files.

ADD REPLY • link 12.1 years ago by Weronika ▴ 300