Question: Loop Thousands Of Orthologous Msas Into Raxml
gravatar for Louis
7.7 years ago by
Louis50 wrote:

Greetings :)

I'm having difficulty feeding raxml many MSAs. My final goal is to create a single ML majority-rule consensus tree based on 6,405 alignments of orthologous genes. My pipeline thus far is orthomcl > MUSCLE > trimal > raxml. My bottleneck is raxml. Here's what I tried (as well as variations of this loop):

Run full BS and ML analysis in raxml

for f in $(ls raxmltest/vbro*.phy); do
    raxmlHPC -f a -x 12345 -p 12345 -# 100 -m PROTGAMMAJTTF -s $f >${f/%.phy} -n Test;

My issue is that running $f -n Test only writes output for 1 MSA. I would like to write output for all MSAs. Any advice or assistance much appreciated. Even better if you can help me with the next step as well - using the 6,405 trees to build a single consensus. I know the following works for one MSA.

Use bootstrap replicates to build majority-rule consensus tree

for f in $(ls raxmltest/vbro*.phy); do
    raxmlHPC -m PROTGAMMA -J MR -z RAxML_bootstrap.Test -n Test;

Thank you!

consensus tree • 2.7k views
ADD COMMENTlink modified 7.7 years ago by Dan Gaston7.1k • written 7.7 years ago by Louis50

why not concatenate all alignments together? As I know, concatenation is a standard approach in multi-locus phylogeny reconstruction. Maybe you have different Taxon sampling in each alignment?

ADD REPLYlink written 7.7 years ago by Vitis2.2k

Thank you for the reply. I think concatenating the sequences would produce an accurate tree and I will do that; however, a few recent publications are big on using many trees to create a consensus tree and I wanted to try it out as well. I do not know how much they will vary. I'll continue posting what I learn to this thread.

ADD REPLYlink written 7.7 years ago by Louis50

I think you're talking about the supertree approaches. Usually they're used when taxon sampling can't be matched for each locus and you'd like to keep as much as information as possible (concatenation would throw away alignments that have missing taxa).

ADD REPLYlink written 7.7 years ago by Vitis2.2k

My bacteria are especially adept at lateral gene transfer so you're correct that not every strain matched locus for locus. It would be nice to retain as much data as possible in the tree.

ADD REPLYlink written 7.7 years ago by Louis50

Lateral gene transfer is a major problem in phylogeny reconstruction of prokaryotes. Indeed, supertree methods might be better in this case, to resolve the lateral gene transfers where you see strong topology disagree among loci.

ADD REPLYlink written 7.7 years ago by Vitis2.2k
gravatar for Louis
7.7 years ago by
Louis50 wrote:

OK - I've found an answer. A friend who knows much more about phylogenetics than I and who has been through this previously, provided direction to make it happen. In summary, you create a working directory containing a subdirectory for each MSA. You can then use a combination of 2 scripts to loop through all the subdirectories and call raxmlHPC for each MSA. Output was piped to the subdirectories and returned 4,346 trees. Some of the MSAs, based on orthomcl orthologous clusters, contained too few strains to create a tree. If you have questions I can provide more specifics.

Now, I still need to make that consensus tree :/

ADD COMMENTlink written 7.7 years ago by Louis50


I have a similar problem and I am trying to solve it. I tried the two for-loops you suggested, but I think I get an error on the first file and then I am thrown outside the loop and everything stops there. See below what I do and what I get:

for subdirectory in .; do for file in $subdirectory; do raxmlHPC -m GTRGAMMA -p 12345 -s $file -b 12345 -#100 -T 2 -n tree; done; done

Use raxml with AVX support with overriden number of threads

RAxML can't, parse the alignment file as phylip file it will now try to parse it as FASTA file


It is probably something wrong in my loop but my brain is stuck right now...

ADD REPLYlink modified 22 days ago • written 22 days ago by katerinapargana0
gravatar for Dan Gaston
7.7 years ago by
Dan Gaston7.1k
Dan Gaston7.1k wrote:

The simplest solution is to simply append the tree results to one file. However, RAxML requires a unique "basename" for every run. Since you are already running it in a loop in a script it is trivial to have the output test be a variable name instead of hardcoded test. Yes this will create a lot of files, but you can clean those up afterwards. Then take the output of the tree file and issue another unix command to append the results of the tree file to a master file (Trees.txt for example).

ADD COMMENTlink written 7.7 years ago by Dan Gaston7.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2278 users visited in the last hour