Phylogenetic tree construction from MAFFT aligned large genome dataset ?
1
0
Entering edit mode
4.0 years ago
k.kathirvel93 ▴ 300

Hi EveryOne,

I have MSA file of 11000 genomes (30k size each) aligned by MAFFT. I want to construct a phylogenetic tree from this large MSA file (500megabytes). I tried MEGAX and RAxML but, it takes so long and at last it got crashed in my ubuntu 16.04, 8GB RAM and 1TB HD workstation. So, can anyone suggest me to accomplish this ? Thanks

alignment sequence genome • 2.6k views
ADD COMMENT
1
Entering edit mode

You should reduce redundancy since it is unlikely that all 11K genomes are completely unique w.r.t sequence. SARS genomes?

ADD REPLY
0
Entering edit mode

A multiple sequence alignment of that many sequences which are that long is highly likely to be spurious, unless all the genomes are incredibly similar (in which case you might as well remove redundant identical sequences).

ADD REPLY
0
Entering edit mode

On the lines of what genomax and Joe already alluded to, you can reduce redundancy in 2 ways:

  1. At the whole genome sequence level (tracking which genomes are identical at even 100% sequence identity), with CD-HIT or some such tool,
  2. but regardless, for the MAFFT alignment, you can remove uninformative alignment columns with a tool like here - https://docs.qiime2.org/2020.2/plugins/available/phylogeny/align-to-tree-mafft-fasttree/ OR one of the tools listed in this comparison of methods paper (see Table 2 ) - https://academic.oup.com/mbe/article/30/3/689/1040880

And submit this reduced representation to RaXML at the CIPRES Gateway, that might help...

ADD REPLY
0
Entering edit mode
4.0 years ago
Anand Rao ▴ 630

Have you tried running the same input file on RaxML at the CIPRES gateway at http://www.phylo.org/ ?

In the past, you'd have to create an account for this,

I have not used this recently, so I am not aware of current signup, geographic location and other user requirements... Good luck!

ADD COMMENT

Login before adding your answer.

Traffic: 2655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6