ClustalOmegaCommandline (multiple sequence alignment) no output file
0
0
Entering edit mode
9 months ago

Hi,

First of all, I am a bioinformatics newbie, so sorry for any obvious mistakes or syntax use. My aim is to get programmatic access to UniProt ID database to download multiple sequences and storing those in a fasta fila. With this file I would like to do multiple sequence alignment.

So far I am able to retrieve a fasta file with multiple sequences in the file. However, whenever I want to align them I don't get an error, but when I use the output file to read it for example it cannot be found. (I can also not find the file on my computer). I think the alignment is not done properly (or at least maybe I use the wrong format in the input file), but I don't know how to solve it.

This is my code.

    import urllib.parse
import urllib.request

UniprotID = 'Q9Y6I4 Q9Y5T5 Q9Y2K6'

#Transfer Uniprot ID to ACC to get sequence information
params = {
'from': 'ACC+ID',
'to': 'ACC',
'format': 'fasta',
'query': UniprotID
}

data = urllib.parse.urlencode(params)
data = data.encode('utf-8')
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as f:
response = response.decode('utf-8')
print(response)

#to store data in fasta file
with open("in_filename.fasta", "w") as f:
for x in response:
f.write(str(x))

#multiple sequence alignment
from Bio.Align.Applications import ClustalOmegaCommandline
in_file = "in_filename.fasta"
out_file = "out_filename.phylip"
clustalomega_cline = ClustalOmegaCommandline(infile = in_file, outfile = out_file, outfmt = 'phylip', verbose = True, auto = False)
print(clustalomega_cline)

from Bio import AlignIO


The printed response and also fasta file looks like:

>sp|Q9Y6I4|UBP3_HUMAN Ubiquitin carboxyl-terminal hydrolase 3 OS=Homo sapiens OX=9606 GN=USP3 PE=1 SV=2
MECPHLSSSVCIAPDSAKFPNGSPSSWCCSVCRSNKSPWVCLTCSSVHCGRYVNGHAKKH
YEDAQVPLTNHKKSEKQDKVQHTVCMDCSSYSTYCYRCDDFVVNDTKLGLVQKVREHLQN
EQFCCYFKELPAVELRNGKTAGRRTYHTRSQGDNNVSLVEEFRKTLCALWQGSQTAFSPE
SLFYVVWKIMPNFRGYQQQDAHEFMRYLLDHLHLELQGGFNGVSRSAILQENSTLSASNK
CCINGASTVVTAIFGGILQNEVNCLICGTESRKFDPFLDLSLDIPSQFRSKRSKNQENGP
VCSLRDCLRSFTDLEELDETELYMCHKCKKKQKSTKKFWIQKLPKVLCLHLKRFHWTAYL
RNKVDTYVEFPLRGLDMKCYLLEPENSGPESCLYDLAAVVVHHGSGVGSGHYTAYATHEG
RWFHFNDSTVTLTDEETVVKAKAYILFYVEHQAKAGSDKL
>sp|Q9Y5T5|UBP16_HUMAN Ubiquitin carboxyl-terminal hydrolase 16 OS=Homo sapiens OX=9606 GN=USP16 PE=1 SV=1
MGKKRTKGKTVPIDDSSETLEPVCRHIRKGLEQGNLKKALVNVEWNICQDCKTDNKVKDK
AEEETEEKPSVWLCLKCGHQGCGRNSQEQHALKHYLTPRSEPHCLVLSLDNWSVWCYVCD
NEVQYCSSNQLGQVVDYVRKQASITTPKPAEKDNGNIELENKKLEKESKNEQEREKKENM
AKENPPMNSPCQITVKGLSNLGNTCFFNAVMQNLSQTPVLRELLKEVKMSGTIVKIEPPD
LALTEPLEINLEPPGPLTLAMSQFLNEMQETKKGVVTPKELFSQVCKKAVRFKGYQQQDS
QELLRYLLDGMRAEEHQRVSKGILKAFGNSTEKLDEELKNKVKDYEKKKSMPSFVDRIFG
GELTSMIMCDQCRTVSLVHESFLDLSLPVLDDQSGKKSVNDKNLKKTVEDEDQDSEEEKD
NDSYIKERSDIPSGTSKHLQKKAKKQAKKQAKNQRRQQKIQGKVLHLNDICTIDHPEDSE
YEAEMSLQGEVNIKSNHISQEGVMHKEYCVNQKDLNGQAKMIESVTDNQKSTEEVDMKNI
NMDNDLEVLTSSPTRNLNGAYLTEGSNGEVDISNGFKNLNLNAALHPDEINIEILNDSHT
PGTKVYEVVNEDPETAFCTLANREVFNTDECSIQHCLYQFTRNEKLRDANKLLCEVCTRR
QCNGPKANIKGERKHVYTNAKKQMLISLAPPVLTLHLKRFQQAGFNLRKVNKHIKFPEIL
DLAPFCTLKCKNVAEENTRVLYSLYGVVEHSGTMRSGHYTAYAKARTANSHLSNLVLHGD
IPQDFEMESKGQWFHISDTHVQAVPTTKVLNSQAYLLFYERIL
>sp|Q9Y2K6|UBP20_HUMAN Ubiquitin carboxyl-terminal hydrolase 20 OS=Homo sapiens OX=9606 GN=USP20 PE=1 SV=2
STIHAQAKKHNLTVNLTTFRLWCYACEKEVFLEQRLAAPLLGSSSKFSEQDSPPPSHPLK
RTDKKPALCKSYQKLVSEVWHKKRPSYVVPTSLSHGIKLVNPMFRGYAQQDTQEFLRCLM
DQLHEELKEPVVATVALTEARDSDSSDTDEKREGDRSPSEDEFLSCDSSSDRGEGDGQGR
YVLKKAQVLSAGSRRRKEQRYRSVISDIFDGSILSLVQCLTCDRVSTTVETFQDLSLPIP
GKEDLAKLHSAIYQNVPAKPGACGDSYAAQGWLAFIVEYIRRFVVSCTPSWFWGPVVTLE
SHVSFPLEGLDLRPFLAKECTSQITTYDLLSVICHHGTAGSGHYIAYCQNVINGQWYEFD
DQYVTEVHETVVQNAEGYVLFYRKSSEEAMRERQQVVSLAAMREPSLLRFYVSREWLNKF
NTFAEPGPITNQTFLCSHGGIPPHKYHYIDDLVVILPQNVWEHLYNRFGGGPAVNHLYVC
SICQVEIEALAKRRRIEIDTFIKLNKAFQAEESPGVIYCISMQWFREWEAFVKGKDNEPP
LHGEQKIEAETRAV


The printed cline is:

clustalo -i in_filename.fasta -o out_filename.phylip --outfmt phylip -v


However, the error I get when reading the out_filename.phylip I get: (I also cannot find it on my computer)

FileNotFoundError: [Errno 2] No such file or directory: 'out_filename.phylip'


I'm not sure where the problem is and I even question if the alignment is performed. If I upload the "in_filename.fasta" to the Clustal website for alignment, there is no problem and sequences are aligned. So therefore I am not really sure how to solve this. Any help/tips/recommendations are really welcome!!

sequence • 266 views
0
Entering edit mode

Does that command complete successfully if you run the clustalomega directly rather than through BioPython?

0
Entering edit mode

You need to execute the clustalomega_cline (imagine it as if you've formatted parameters to a string, and you need to paste it to terminal). Use the clustalomega_cline() to actually run the program.

0
Entering edit mode

Ooh thank you that makes more sense!

Whenever I do that I get the error message:

ApplicationError: Non-zero return code 127 from 'clustalo -i unaligned.fa -o aligned.phy --outfmt phylip -v', message '/bin/sh: clustalo: command not found'


0
Entering edit mode

This is why I asked if you are able to run the command correctly without biopython. It seems you don't have it installed, or it isn't available in \$PATH under the correct name.

what is the output of the command: which -a clustalo ?