Loop Thousands Of Orthologous Msas Into Raxml
2
2
Entering edit mode
12.2 years ago
Louis ▴ 50

Greetings :)

I'm having difficulty feeding raxml many MSAs. My final goal is to create a single ML majority-rule consensus tree based on 6,405 alignments of orthologous genes. My pipeline thus far is orthomcl > MUSCLE > trimal > raxml. My bottleneck is raxml. Here's what I tried (as well as variations of this loop):

Run full BS and ML analysis in raxml

for f in $(ls raxmltest/vbro*.phy); do
    raxmlHPC -f a -x 12345 -p 12345 -# 100 -m PROTGAMMAJTTF -s $f >${f/%.phy} -n Test;
done

My issue is that running $f -n Test only writes output for 1 MSA. I would like to write output for all MSAs. Any advice or assistance much appreciated. Even better if you can help me with the next step as well - using the 6,405 trees to build a single consensus. I know the following works for one MSA.

Use bootstrap replicates to build majority-rule consensus tree

for f in $(ls raxmltest/vbro*.phy); do
    raxmlHPC -m PROTGAMMA -J MR -z RAxML_bootstrap.Test -n Test;
done

Thank you!

consensus tree • 4.3k views
ADD COMMENT
0
Entering edit mode

why not concatenate all alignments together? As I know, concatenation is a standard approach in multi-locus phylogeny reconstruction. Maybe you have different Taxon sampling in each alignment?

ADD REPLY
0
Entering edit mode

Thank you for the reply. I think concatenating the sequences would produce an accurate tree and I will do that; however, a few recent publications are big on using many trees to create a consensus tree and I wanted to try it out as well. I do not know how much they will vary. I'll continue posting what I learn to this thread.

ADD REPLY
0
Entering edit mode

I think you're talking about the supertree approaches. Usually they're used when taxon sampling can't be matched for each locus and you'd like to keep as much as information as possible (concatenation would throw away alignments that have missing taxa).

ADD REPLY
0
Entering edit mode

My bacteria are especially adept at lateral gene transfer so you're correct that not every strain matched locus for locus. It would be nice to retain as much data as possible in the tree.

ADD REPLY
0
Entering edit mode

Lateral gene transfer is a major problem in phylogeny reconstruction of prokaryotes. Indeed, supertree methods might be better in this case, to resolve the lateral gene transfers where you see strong topology disagree among loci.

ADD REPLY
1
Entering edit mode
12.2 years ago
Louis ▴ 50

OK - I've found an answer. A friend who knows much more about phylogenetics than I and who has been through this previously, provided direction to make it happen. In summary, you create a working directory containing a subdirectory for each MSA. You can then use a combination of 2 loop.sh scripts to loop through all the subdirectories and call raxmlHPC for each MSA. Output was piped to the subdirectories and returned 4,346 trees. Some of the MSAs, based on orthomcl orthologous clusters, contained too few strains to create a tree. If you have questions I can provide more specifics.

Now, I still need to make that consensus tree :/

ADD COMMENT
0
Entering edit mode

Hello!!

I have a similar problem and I am trying to solve it. I tried the two for-loops you suggested, but I think I get an error on the first file and then I am thrown outside the loop and everything stops there. See below what I do and what I get:

for subdirectory in .; do for file in $subdirectory; do raxmlHPC -m GTRGAMMA -p 12345 -s $file -b 12345 -#100 -T 2 -n tree; done; done

Use raxml with AVX support with overriden number of threads

RAxML can't, parse the alignment file as phylip file it will now try to parse it as FASTA file

TOO FEW SPECIES

It is probably something wrong in my loop but my brain is stuck right now...

ADD REPLY
0
Entering edit mode
12.2 years ago
DG 7.3k

The simplest solution is to simply append the tree results to one file. However, RAxML requires a unique "basename" for every run. Since you are already running it in a loop in a script it is trivial to have the output test be a variable name instead of hardcoded test. Yes this will create a lot of files, but you can clean those up afterwards. Then take the output of the tree file and issue another unix command to append the results of the tree file to a master file (Trees.txt for example).

ADD COMMENT

Login before adding your answer.

Traffic: 2876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6