Question

Forum:Palo: The Importance (And Impact) Of Aligning Matching Isoforms In Multiple Sequence Alignments

6

Entering edit mode

10.1 years ago

Biojl ★ 1.7k

Protein ALignment Optimiser (PALO) is an algorithm for the selection of the best combination of protein isoforms among orthologous genes in the construction of a multiple alignment. You can easily upload your files from ENSEMBL and this tool will tell you which is the most suitable combination for you to align.

Large-scale evolutionary studies often require the automated construction of alignments of a large number of homologous gene families. The majority of eukaryotic genes can produce different transcripts due to alternative splicing or transcription initiation, and many such transcripts encode different protein isoforms. As analyses tend to be gene centered, one single-protein isoform per gene is selected for the alignment, with the de facto approach being to use the longest protein isoform per gene (Longest), presumably to avoid including partial sequences and to maximize sequence information. Here, we show that this approach is problematic because it increases the number of indels in the alignments due to the inclusion of nonhomologous regions, such as those derived from species-specific exons, increasing the number of misaligned positions. With the aim of ameliorating this problem, we have developed a novel heuristic, Protein ALignment Optimizer (PALO), which, for each gene family, selects the combination of protein isoforms that are most similar in length.

Take a look at the tutorial section. You can either use this online version (section Run) or download the raw code (python-github) and run it in your local machine.

evolution • 3.5k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 10.1 years ago by Biojl ★ 1.7k

2

Entering edit mode

Good job! But the website returns an error when clicking on "Run"->"Last Results", if no results have been run previously. You should also improve the documentation, because I could not understand how do you select the most common splicing isoform for the alignment, if not by reading the paper. You should also add some example input files in the "Run" page, i.e. something that can be run without following the whole tutorial.

ADD REPLY • link 10.1 years ago by Giovanni M Dall'Olio 28k

1

Entering edit mode

Thanks Giovanni! There is a section called theory (http://evolutionarygenomics.imim.es/josepl/codeigniter/index.php/pages/view/theory) where the algorithm is explained, but maybe I could develop it further. Regarding the already made input files, they are available in the File Format section of the tutorial as links, I will move them to the Run section as well :). Finally I'm aware of that bug, I'm a bit short of time right now but in theory no one should go to last results without running anything (yes, we all know the theory... ;)). Thanks again for your input, it's very useful.

ADD REPLY • link 10.1 years ago by Biojl ★ 1.7k