Global Alignment Of Short Reads
1
2
Entering edit mode
12.1 years ago
Pasta ★ 1.3k

Hi,

I have short reads that I would like to align on a bacterial genome. I need to know the exact identity between the reads and the reference genome, even if there are mismatches or gaps near the 5' or 3' ends.

To be sure to make myself clear: on this 20nt sequence (MM in lowercase) I would expect an identity of 16/20= 80%

GaaGCCGTTCTTATAGTaaT

I tried with BWA but for a 36bp read I can only have up to max of 5 mismatches, it doesnt seem to be able to give the identity for more MM/gaps (is there anyway to do that ?)

I am thinking of using a global aligner (free license) that should also work correctly on relatively short reads (36bp). If this aligner could output an easy-to-parse output file that would be X-mas for me.

Also, I tried Exonerate but I cannot figured out how to make it work the way I want (the options --exhaustive --model affine:global do not seem to work ...).

Any help is appreciated.

Thanks

alignment short • 3.1k views
ADD COMMENT
0
Entering edit mode

You will need a more sensitive tool than bwa, but this comes for the price of increased run time. How many reads do you have? You can nowadays run exact algorithms on up to a few thousand reads, but if it is more than that a heuristic approach is required.

ADD REPLY
0
Entering edit mode

I have 13 million reads....

ADD REPLY
2
Entering edit mode
12.1 years ago

Novoalign can do full Needleman-Wunsch. It is quite sensitive as well.

ADD COMMENT
0
Entering edit mode

Sorry, I forgot to mention that I am looking for a open-source tool

ADD REPLY
0
Entering edit mode

(Note, Novoalign is free for academic and non-profit use)

ADD REPLY
0
Entering edit mode

Oh, ok I will give it a try then. Thx

ADD REPLY
0
Entering edit mode

Seconding novoalign. I've ran it on deepseq datasets of 10-50 million reads, it's much slower than bowtie but still finishes within a day or two on a reasonably powerful desktop.

ADD REPLY
0
Entering edit mode

Seconding novoalign. I've ran it on deepseq datasets of 10-50 million 36bp reads, it's much slower than bowtie but still finishes within a day or two on a reasonably powerful desktop. The "native" novoalign output is pretty easy to parse, and it can also output sam files.

ADD REPLY

Login before adding your answer.

Traffic: 2545 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6