Question: Looking for phylogenetic python api
0
gravatar for virus_n00b
2.6 years ago by
virus_n00b0
India
virus_n00b0 wrote:

I have a sequence file (in fasta format) which has 12527 sequences and the length of each sequence is 61. I need to construct a phylogeny tree on this sequence. I am using a python package called "biopython" to perform the below steps:

  1. Use clustal omega command line tool to generate the alignment file.
  2. Read the alignment file generated and plot the phylogenetic tree.

However, clustal omega throws the following error:

HHalignWrapper:hhalign_wrapper.c:1419: problem in alignment (profile sizes: 1 + 1) (VS_bf95329ccdd7babf5bee3f5b2f5a2f50 + VS_b22382a3e858d9a132f3263009383220), forcing Viterbi
        hh-error-code=4 (mac-ram=8000) hhalign:hhalign.cpp:961: Problem Reading/Preparing profiles (len(q)=0/len(t)=0) HHalignWrapper:hhalign_wrapper.c:1447: problem in alignment, Viterbi did not work
        hh-error-code=4 (mac-ram=64000) hhalign:hhalign.cpp:961: Problem Reading/Preparing profiles (len(q)=0/len(t)=0) FATAL: could not perform alignment -- bailing out

When I perform clustalw2 on the same sequence, it works but takes too long. Can someone point out what exactly is the cause for the issue ?

sequencing alignment • 861 views
ADD COMMENTlink modified 2.6 years ago by Joe17k • written 2.6 years ago by virus_n00b0

It looks like you have empty sequences, make sure your input is ok. Clustal uses pretty simple algorithms to build the tree, if your desired output is the phylogenetic tree I would recommend to use other algorithms.

ADD REPLYlink written 2.6 years ago by Asaf8.2k

I'm sure there are no empty sequences. On the other hand, my desired output is phylogenetic tree so can you suggest some algorithms which can scale up to even thousands of sequences?

ADD REPLYlink written 2.6 years ago by virus_n00b0

See for instance ape package in R

ADD REPLYlink written 2.6 years ago by Asaf8.2k
2
gravatar for Joe
2.6 years ago by
Joe17k
United Kingdom
Joe17k wrote:

I would suggest you build the tree with a dedicated tool that scales well first, rather than trying to do everything through python.

The MUSCLE aligner is fast for large numbers of sequences, and then I would probably use something like RAxML to build the tree. If RAxML is too slow (if the sequences are very long or high in number, as yours are, you may encounter issues), try fasttree.

Biopython is best used for manipulating bioinformatics file types, not usually for creating them in the first place.

This is very generalised advice however, so if you can provide more information about your input data and the actual question you’re attempting to answer, we might be able to provide more specific help.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Joe17k

That's exactly what I needed. Muscle scales well for my problem with -maxiters 2. However, I have a issue in the second part you said "Using fasttree for tree building". MUSCLE outputs a alignment file with header MUSCLE Version number and fasttext is throwing an error for this. Is there any workaround for this?

ADD REPLYlink written 2.6 years ago by virus_n00b0

Off the top of my head, muscle should support all standard output file types. I can’t recall all the details right now and am not near a computer to test, but their online manual should have all the supported output types.

If not, it should be simple to do some standard command line text manipulation to remove the header line if the file is otherwise correct.

ADD REPLYlink written 2.6 years ago by Joe17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1122 users visited in the last hour