Cons of Smith-Waterman Alignment
2
0
Entering edit mode
12 months ago
Student ▴ 30

Hello.

I was reading this article about Whole-Genome Alignment and Comparative Annotation. I find difficult to understand this extract from the article that describes the cons of using Smith-Waterman algorithm for the alignment problem:

Another consideration is how genome rearrangements complicate the alignment problem. Smith–Waterman and Needleman–Wunsch both produce alignments that have fixed order and orientation; that is, insertions, deletions, and substitutions are the only allowed edit operations. When looking within short or well-conserved sequences, like genes, this requirement is usually fulfilled. But at large evolutionary distances and looking within a sufficiently large window, genomes almost always contain more complex rearrangements with respect to each other—inversions, transpositions, and duplications all cause breaks in order and orientation that cannot be captured under constant order and orientation.

My doubt is: what do they mean by "orientation" ?

Smith-Waterman Genomics Alignment DNA Sequences • 831 views
1
Entering edit mode
12 months ago
Guillermo ▴ 10

Hello there!

Normally when people talk about gene orientation, they are referring to whether the gene is encoded on the positive or negative strand of DNA.

I found a complimentary video that may help: https://www.youtube.com/watch?v=JC6ew2xnJBA

1
Entering edit mode
12 months ago

Well, there are expected inversions and translocations in a genome. A piece that was oriented as ---===>----> in a genome of one organism may be oriented as ---<====----> in another organism of the same specie. Dynamic programming algorithms can not detect this, obviously.

0
Entering edit mode

I do not understand why dynamic programming algorithms can not detect this... for example, if I align a gene that has a sequence 5'->3' with an other that is 3'->5', Smith-Waterman algorithm is not able to do it ?

1
Entering edit mode

You can check it here https://www.ebi.ac.uk/Tools/psa/emboss_water/ (put DNA):

ACGACACGTAGCAGCATGCAGCATCATACAGCATCACAGTCAGTTTCAGCAGCAAACTACAGT


and its reverse

TGACATCAAACGACGACTTTGACTGACACTACGACATACTACGACGTACGACGATGCACAGCA


The result:

EMBOSS_001         1 ACGAC--------ACGTAGCAGCATGCAGCATCATACAGCA     33
|||||        |||||..|..||||        ||||||
EMBOSS_001        31 ACGACATACTACGACGTACGACGATGC--------ACAGCA     63


Not really what we expect, true?

0
Entering edit mode

ok thank you , I saw again the algorithm and in fact it makes sense to me that it does not detect this .

1
Entering edit mode

yeap, I wanted to answer "see the algorithm itself", but then understood that you may come not from compsci background, so gave an example =)