Question: issue with fasttree package
0
gravatar for virus_n00b
5 days ago by
virus_n00b0
India
virus_n00b0 wrote:

I need to derive the phylogenetic tree for a group of sequences. I am using MUSCLE to perform sequence alignment. The output of MUSCLE is a sequence aligned file which has the following header

MUSCLE (3.8) multiple sequence alignment

Now to build the tree, I am using FastTree. I pass the following command

FastTree PATH_TO_ALIGNMENT_FILE > PATH_TO_TREE_FILE

which results in the below error

FastTree Version 2.1.10 SSE3
Alignment: ../muscle/b1300dc46e02615c56cd762b141547c0.muscle
Amino acid distances: BLOSUM45 Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Jones-Taylor-Thorton, CAT approximation with 20 rate categories
Error parsing header line:MUSCLE (3.8) multiple sequence alignment

Is there a workaround for this error?

Input file : https://drive.google.com/open?id=1-13zaYJeL0XWNMHHKUZG62b4bCh5qZ4I

File generated by MUSCLE : https://drive.google.com/open?id=13Id1UqBtaLGT7HdK8_tb-X5DuFPchy-y

muscle fasttree sequence • 119 views
ADD COMMENTlink modified 5 days ago • written 5 days ago by virus_n00b0

What is the format of the MUSCLE alignment file? Is it fasta?

ADD REPLYlink written 5 days ago by Sej Modha2.2k

Yes. It is fasta. I have passed the following command to MUSCLE

muscle -in b1300dc46e02615c56cd762b141547c0.fasta -out b1300dc46e02615c56cd762b141547c0.muscle

I have also added the Input and Sequence file generated by MUSCLE.

ADD REPLYlink modified 5 days ago • written 5 days ago by virus_n00b0

Input file you provided here is in FASTA (https://en.wikipedia.org/wiki/FASTA_format) format but contains only Ns in the sequences, the output file is not in the FASTA format. Are you sure you want to run FastTree on such data that only contain Ns?

ADD REPLYlink modified 5 days ago • written 5 days ago by Sej Modha2.2k

It is a collection of M's and N's with very few M's. In case the output file is not in fasta format, then the issue must be from MUSCLE side. I am sure that my input file is correct because the same file is used to generate the alignment and phylogenetic tree using the web service (https://www.ebi.ac.uk/Tools/msa/muscle/). But they have a cap on the number of sequences and the size of file and hence I need to run these on my local system.

ADD REPLYlink written 5 days ago by virus_n00b0

I am unable to replicate the same error at my end with your data as the output file generated for me is in FASTA format. It is worth generating output file in the clustalw format and converting them using online tools such as https://www.ebi.ac.uk/Tools/sfc/emboss_seqret/

ADD REPLYlink written 4 days ago by Sej Modha2.2k

Since you are unable to replicate the error, I assume there is something wrong on my side. Can you try the following at your end...

  1. Use muscle to convert the Input file using

    muscle -in b1300dc46e02615c56cd762b141547c0.fasta -out b1300dc46e02615c56cd762b141547c0.muscle -clwstrict -maxiters 2

  2. Use Fasttree on the b1300dc46e02615c56cd762b141547c0.muscle file generated.

    FastTree b1300dc46e02615c56cd762b141547c0.muscle > b1300dc46e02615c56cd762b141547c0.tree

ADD REPLYlink written 4 days ago by virus_n00b0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1512 users visited in the last hour