ClustalOmegaCommandline (multiple sequence alignment) no output file
0
0
Entering edit mode
3.8 years ago

Hi,

First of all, I am a bioinformatics newbie, so sorry for any obvious mistakes or syntax use. My aim is to get programmatic access to UniProt ID database to download multiple sequences and storing those in a fasta fila. With this file I would like to do multiple sequence alignment.

So far I am able to retrieve a fasta file with multiple sequences in the file. However, whenever I want to align them I don't get an error, but when I use the output file to read it for example it cannot be found. (I can also not find the file on my computer). I think the alignment is not done properly (or at least maybe I use the wrong format in the input file), but I don't know how to solve it.

This is my code.

    import urllib.parse
    import urllib.request

    url = 'https://www.uniprot.org/uploadlists/'

    UniprotID = 'Q9Y6I4 Q9Y5T5 Q9Y2K6'

    #Transfer Uniprot ID to ACC to get sequence information
    params = {
    'from': 'ACC+ID',
    'to': 'ACC',
    'format': 'fasta',
    'query': UniprotID
    }

    data = urllib.parse.urlencode(params)
    data = data.encode('utf-8')
    req = urllib.request.Request(url, data)
    with urllib.request.urlopen(req) as f:
       response = f.read()
    response = response.decode('utf-8')
    print(response)

#to store data in fasta file
    with open("in_filename.fasta", "w") as f:
        for x in response:
            f.write(str(x))


#multiple sequence alignment
from Bio.Align.Applications import ClustalOmegaCommandline 
    in_file = "in_filename.fasta"
    out_file = "out_filename.phylip"
    clustalomega_cline = ClustalOmegaCommandline(infile = in_file, outfile = out_file, outfmt = 'phylip', verbose = True, auto = False)
    print(clustalomega_cline)

    from Bio import AlignIO
    alignment = AlignIO.read("out_filename.phylip", "clustalo")

The printed response and also fasta file looks like:

>sp|Q9Y6I4|UBP3_HUMAN Ubiquitin carboxyl-terminal hydrolase 3 OS=Homo sapiens OX=9606 GN=USP3 PE=1 SV=2
MECPHLSSSVCIAPDSAKFPNGSPSSWCCSVCRSNKSPWVCLTCSSVHCGRYVNGHAKKH
YEDAQVPLTNHKKSEKQDKVQHTVCMDCSSYSTYCYRCDDFVVNDTKLGLVQKVREHLQN
LENSAFTADRHKKRKLLENSTLNSKLLKVNGSTTAICATGLRNLGNTCFMNAILQSLSNI
EQFCCYFKELPAVELRNGKTAGRRTYHTRSQGDNNVSLVEEFRKTLCALWQGSQTAFSPE
SLFYVVWKIMPNFRGYQQQDAHEFMRYLLDHLHLELQGGFNGVSRSAILQENSTLSASNK
CCINGASTVVTAIFGGILQNEVNCLICGTESRKFDPFLDLSLDIPSQFRSKRSKNQENGP
VCSLRDCLRSFTDLEELDETELYMCHKCKKKQKSTKKFWIQKLPKVLCLHLKRFHWTAYL
RNKVDTYVEFPLRGLDMKCYLLEPENSGPESCLYDLAAVVVHHGSGVGSGHYTAYATHEG
RWFHFNDSTVTLTDEETVVKAKAYILFYVEHQAKAGSDKL
>sp|Q9Y5T5|UBP16_HUMAN Ubiquitin carboxyl-terminal hydrolase 16 OS=Homo sapiens OX=9606 GN=USP16 PE=1 SV=1
MGKKRTKGKTVPIDDSSETLEPVCRHIRKGLEQGNLKKALVNVEWNICQDCKTDNKVKDK
AEEETEEKPSVWLCLKCGHQGCGRNSQEQHALKHYLTPRSEPHCLVLSLDNWSVWCYVCD
NEVQYCSSNQLGQVVDYVRKQASITTPKPAEKDNGNIELENKKLEKESKNEQEREKKENM
AKENPPMNSPCQITVKGLSNLGNTCFFNAVMQNLSQTPVLRELLKEVKMSGTIVKIEPPD
LALTEPLEINLEPPGPLTLAMSQFLNEMQETKKGVVTPKELFSQVCKKAVRFKGYQQQDS
QELLRYLLDGMRAEEHQRVSKGILKAFGNSTEKLDEELKNKVKDYEKKKSMPSFVDRIFG
GELTSMIMCDQCRTVSLVHESFLDLSLPVLDDQSGKKSVNDKNLKKTVEDEDQDSEEEKD
NDSYIKERSDIPSGTSKHLQKKAKKQAKKQAKNQRRQQKIQGKVLHLNDICTIDHPEDSE
YEAEMSLQGEVNIKSNHISQEGVMHKEYCVNQKDLNGQAKMIESVTDNQKSTEEVDMKNI
NMDNDLEVLTSSPTRNLNGAYLTEGSNGEVDISNGFKNLNLNAALHPDEINIEILNDSHT
PGTKVYEVVNEDPETAFCTLANREVFNTDECSIQHCLYQFTRNEKLRDANKLLCEVCTRR
QCNGPKANIKGERKHVYTNAKKQMLISLAPPVLTLHLKRFQQAGFNLRKVNKHIKFPEIL
DLAPFCTLKCKNVAEENTRVLYSLYGVVEHSGTMRSGHYTAYAKARTANSHLSNLVLHGD
IPQDFEMESKGQWFHISDTHVQAVPTTKVLNSQAYLLFYERIL
>sp|Q9Y2K6|UBP20_HUMAN Ubiquitin carboxyl-terminal hydrolase 20 OS=Homo sapiens OX=9606 GN=USP20 PE=1 SV=2
MGDSRDLCPHLDSIGEVTKEDLLLKSKGTCQSCGVTGPNLWACLQVACPYVGCGESFADH
STIHAQAKKHNLTVNLTTFRLWCYACEKEVFLEQRLAAPLLGSSSKFSEQDSPPPSHPLK
AVPIAVADEGESESEDDDLKPRGLTGMKNLGNSCYMNAALQALSNCPPLTQFFLECGGLV
RTDKKPALCKSYQKLVSEVWHKKRPSYVVPTSLSHGIKLVNPMFRGYAQQDTQEFLRCLM
DQLHEELKEPVVATVALTEARDSDSSDTDEKREGDRSPSEDEFLSCDSSSDRGEGDGQGR
GGGSSQAETELLIPDEAGRAISEKERMKDRKFSWGQQRTNSEQVDEDADVDTAMAALDDQ
PAEAQPPSPRSSSPCRTPEPDNDAHLRSSSRPCSPVHHHEGHAKLSSSPPRASPVRMAPS
YVLKKAQVLSAGSRRRKEQRYRSVISDIFDGSILSLVQCLTCDRVSTTVETFQDLSLPIP
GKEDLAKLHSAIYQNVPAKPGACGDSYAAQGWLAFIVEYIRRFVVSCTPSWFWGPVVTLE
DCLAAFFAADELKGDNMYSCERCKKLRNGVKYCKVLRLPEILCIHLKRFRHEVMYSFKIN
SHVSFPLEGLDLRPFLAKECTSQITTYDLLSVICHHGTAGSGHYIAYCQNVINGQWYEFD
DQYVTEVHETVVQNAEGYVLFYRKSSEEAMRERQQVVSLAAMREPSLLRFYVSREWLNKF
NTFAEPGPITNQTFLCSHGGIPPHKYHYIDDLVVILPQNVWEHLYNRFGGGPAVNHLYVC
SICQVEIEALAKRRRIEIDTFIKLNKAFQAEESPGVIYCISMQWFREWEAFVKGKDNEPP
GPIDNSRIAQVKGSGHVQLKQGADYGQISEETWTYLNSLYGGGPEIAIRQSVAQPLGPEN
LHGEQKIEAETRAV

The printed cline is:

clustalo -i in_filename.fasta -o out_filename.phylip --outfmt phylip -v

However, the error I get when reading the out_filename.phylip I get: (I also cannot find it on my computer)

FileNotFoundError: [Errno 2] No such file or directory: 'out_filename.phylip'

I'm not sure where the problem is and I even question if the alignment is performed. If I upload the "in_filename.fasta" to the Clustal website for alignment, there is no problem and sequences are aligned. So therefore I am not really sure how to solve this. Any help/tips/recommendations are really welcome!!

Thanks in advance! Eve

sequence • 2.4k views
ADD COMMENT
0
Entering edit mode

Does that command complete successfully if you run the clustalomega directly rather than through BioPython?

ADD REPLY
0
Entering edit mode

From the documentation (https://biopython.org/DIST/docs/api/Bio.Align.Applications._ClustalOmega.ClustalOmegaCommandline-class.html)

You need to execute the clustalomega_cline (imagine it as if you've formatted parameters to a string, and you need to paste it to terminal). Use the clustalomega_cline() to actually run the program.

ADD REPLY
0
Entering edit mode

Ooh thank you that makes more sense!

Whenever I do that I get the error message:

ApplicationError: Non-zero return code 127 from 'clustalo -i unaligned.fa -o aligned.phy --outfmt phylip -v', message '/bin/sh: clustalo: command not found'

Would that mean that Clustal-Omega is not downloaded properly?

ADD REPLY
0
Entering edit mode

This is why I asked if you are able to run the command correctly without biopython. It seems you don't have it installed, or it isn't available in $PATH under the correct name.

what is the output of the command: which -a clustalo ?

ADD REPLY

Login before adding your answer.

Traffic: 2130 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6