Question: Fasta with multiple sequences to alignment object which can by used to build a phylogenetic tree
0
gravatar for mac03pat
23 days ago by
mac03pat10
mac03pat10 wrote:

I attempting to take a single fasta file with multiple sequences of variable length as input and output aligned sequenes that I can use to build a phylogenetic tree with biopython phylo.

Here's my file: https://drive.google.com/file/d/1QXXSJ2DJjJHz8K1WHuERFPQBTSvsWrcL/view?usp=sharing

Things I've tried:

from Bio import AlignIO
AlignIO.read(open('extracted_KS_with_taxa.fa'), 'fasta')
print(alignment.format('fasta'))

^ Doesn't work for sequences of unequal length

from Bio.Align.Applications import MuscleCommandline

cline = MuscleCommandline(input='extracted_KS_with_taxa.fa', out='aligned_KS.aln', clwstrict=True)
print(cline)

^ Didn't output a file

from Bio.Align.Applications import MuscleCommandline
muscle_cline = MuscleCommandline(input='extracted_KS_with_taxa.fa')
stdout, stderr = muscle_cline()
from StringIO import StringIO
from Bio import AlignIO
align = AlignIO.read(StringIO(stdout), 'fasta')
print(align)

^ Returned this error:

Traceback (most recent call last):
  File "C:\Users\mac03\AppData\Local\Programs\Python\Python37\MBSProject\align_fasta.py", line 20, in <module>
    stdout, stderr = muscle_cline()
  File "C:\Users\mac03\AppData\Local\Programs\Python\Python37\lib\site-packages\Bio\Application\__init__.py", line 527, in __call__
    stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code 1 from 'muscle -in extracted_KS_with_taxa.fa', message "'muscle' is not recognized as an internal or external command,"
ADD COMMENTlink modified 22 days ago by jrj.healey12k • written 23 days ago by mac03pat10
0
gravatar for jrj.healey
22 days ago by
jrj.healey12k
United Kingdom
jrj.healey12k wrote:

In your first case, I think the problem here is that you’re trying to use AlignIO to read a fasta of sequences, not an alignment (if I understand your data correctly).

AlignIO is specifically for reading formats of pre-aligned data, whereas SeqIO is what you need for reading basic sequence data.

Secondly, print(cline) doesn’t do anything, because thats just the commandline itself, not the result of the alignment. You first need to run muscle, which is what BioPython is doing (you also need it installed).

The fact that you don’t have muscle installed already, is why your last command is failing, because Biopython is shell-ing out to run muscle on the commandline, but doesn’t recognise the command, because there’s no corresponding installed binary for muscle.

I suggest you look closely at the BioPython Tutorial, as there are a good many things you’ve got mixed up here.

ADD COMMENTlink written 22 days ago by jrj.healey12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1455 users visited in the last hour