Phylogeny Based On Genome Rearrangement Pattern?
0
0
Entering edit mode
8.7 years ago
qiyunzhu ▴ 430

Hello all, I am trying to resolve the phylogeny of a bacterial group. Trees infered using conventional approaches based on sequence data are conflicting. Therefore I'm think about using genome rearrangement data to build trees. I first used Mauve to align the genomes and identify the homologous genomic blocks. Then I export the data (permutation matrix) and tried to use other programs to build tree based on it. I'm not interested in rearrangement history, but just focus on species tree. So far I tried MGR, MGRA and BADGER. MGR runs, but veeeeeeerrrry slowly. The latter two programs just don't work for my case. Therefore I am here asking if anyone happens to know some better solutions. Thank you for your time reading this!

phylogenetics genomics • 2.2k views
0
Entering edit mode

can you try coding the rearrangements into 1/0 characters, old school cladistics uses this for all kinds of morphological characters.

0
Entering edit mode

I don't know if there is a way to code it in binary data, or I should have already solved it using RAxML...

0
Entering edit mode

It's maybe a little off-base, but there was a paper at ISMB last month which did something similar, but using FISH copy number data, and reconstructing human cancer phylogenies. More importantly, they built quite a specialised and highly efficient piece of software to do just that. I wonder if your problem might not be similar enough that you could adapt their method?

http://bioinformatics.oxfordjournals.org/content/29/13/i189.full

0
Entering edit mode

Thanks for your suggestion! I browsed the program FISHtrees. Unfortunately it does not seem to be the type we are looking for. Genome rearrangement data is not alignable binary or multi-state data. It's something like:

P1 +a -c -b $+d +e +f$ P2 +d +e +b +c $+a +f$ P3 +a -d $-c -b +e -f$ ...

0
Entering edit mode

It looks to me like that would convert quite well to a binary matrix, with columns for genome regions, rows for species (or individuals), and each entry containing {1,0} to denote presence or absence of that region?

0
Entering edit mode

I'm afraid that isn't the case. It is the order of genes that matters, instead of the presence / absence of genes in each genomic loci.

0
Entering edit mode

Here's a paper where they have coded gene order, presence/absence as a matrix for baculoviruses. It sounds like what you are looking for: http://www.ncbi.nlm.nih.gov/pubmed/11483757

0
Entering edit mode

Yes it is! Thanks for recommending this! I later found a couple of related articles, including the latest ones, such as: www.ncbi.nlm.nih.gov/pubmed/23424133