Question: Getting a blank file when using FastTree
0
gravatar for michaelbeneamatobruno
2.8 years ago by
michaelbeneamatobruno0 wrote:

Right Now I'm currently trying to put 12 whole genomes into a phylogenetic tree. The Genomes were mapped from a reference and they've been converted a couple times (i.e. Bam to Sam, Sam to Fastq, Fastq to fasta), but they look great(we uploaded one to Genious to check if everything was in order). I'm working with FastTree and I'm trying to create a tree with these fasta files, but every time I try uploading one of the genomes, the program will run for a while, but it won't actually do anything. The output tree files are blank and nothing's going into them. The command I'm using is

./FastTree -nt Filename.fasta > myTree

I've seen a similar thread, and I made sure to check that I'm calling the program and that is downloaded and that I'm calling the right files, but i'm still getting a blank file. Any suggestions would be appreciated.

Thanks.

ADD COMMENTlink modified 2.8 years ago by RamRS25k • written 2.8 years ago by michaelbeneamatobruno0
1

Can you also add a 2>myTree.err and see if any the err file has anything to say? If there was absolutely nothing on stdout, you might need to try with a different input file and see if that works.

ADD REPLYlink written 2.8 years ago by RamRS25k
2

is your .fasta an actual FASTA alignment file, or just a load of genome fastas? FastTree takes prealigned fasta's or phylips as input.

ADD REPLYlink written 2.8 years ago by Joe15k

It's an actual Fasta file, the first couple lines look like this

>9757X1 GCTACTATAAGAGTTGTCTAGTAATTCTTAGTAGAAAAGAGTTATTAGAG ATATCTTATAGTACGTCTTTAAACTTAGCTACTCTAAGATTAATAGTAGT ATATCTTATAGTACGTCTTTAAACTTAGCTACTCTAAGATTAATAGTAGT

followed by many more lines of A's, T's, C's, and G's

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by michaelbeneamatobruno0
2

Just to continue the question of jrj.healey...

Apologies for the simple question: Does your multi-sequence FASTA file contain aligned sequences? i.e. all sequences same length, padded out with gap characters (-) to ensure same length.

So A's T's C's G's and gap characters (-).

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Frank Wright (BioSS)70

Pretty sure we might be getting to the nub of the issue here...

You first need to generate a sequence alignment before you can make a tree. If you're attempting to align 12 whole genomes (though you haven't said what they are/how big), this may take a very long time - someone who is more up to date on the state of aligning genomes might know better. I believe MUMmer is capable of aligning whole bacterial genomes via suffix trees.

You may have to appeal to something like MLST instead in order to extrapolate a phylogeny in a reasonable timescale.

ADD REPLYlink written 2.8 years ago by Joe15k

They're pretty big files, between 2 and 6 GB. I'm open to trying different software, but we're looking for phylogenetic info for the BFODMAT revision if at all possible, so I don't know if other softwares will output it in the same format. They're also fungal genomes, so I hesitate using bacterial genome phylogeny compilers. FastTree could very well not be suited for such big files, but it would be nice to get it to work just because FastTree will give us the right output without any additional conversion.

Thanks by the way for all the input. I think I might revise my question with everything that you guys are saying. Any other comments before I do so?

ADD REPLYlink written 2.8 years ago by michaelbeneamatobruno0

They should, I aligned them using samtools. This is my first time at this, so I definitely could have gone wrong, but I think that they've been aligned correctly.

ADD REPLYlink written 2.8 years ago by michaelbeneamatobruno0

These are genomes you've sequenced yourself?

Samtools is a suite of programme for manipulating Sequence Alignment Map data, but it's not the same thing as multiple sequence alignment - not very helpfully distinguished I grant you!

This typically means you create sequencing read alignments to a reference genome (usually with tools like bwa or bowtie ) to check statistics like genome coverage and so on.

Multiple sequence alignment just takes a set of sequences and does what it says on the tin, aligns them with one another.

It sounds to me like you're using the wrong input data. You need to take your fastas, and use something like clustalo, MUSCLE or probably more likely something like MUMmer as mentioned that can deal with big, big, sequences (though only pairwise).

I strongly suspect it may be impossible to MSA fungal genomes though. Someone else might know better.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Joe15k

When I tried I got

FastTree Version 2.1.9 SSE3 Alignment: 9751X1.fasta Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000 Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1 TopHits: 1.00*sqrtN close=default refresh=0.80 ML Model: Jukes-Cantor, CAT approximation with 20 rate categories

I've tried several different files with the same results. I thought that it might be a problem with FastTree, so I redownloaded the program and nothing changed.

ADD REPLYlink written 2.8 years ago by michaelbeneamatobruno0
1

I'd recommend you add that to your question. I'm not the right person to answer your question, but this additional information might help the right person come to a more helpful conclusion.

ADD REPLYlink written 2.8 years ago by RamRS25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1466 users visited in the last hour