Hello everybody, I'm new to Biopython, and programming in general, but I am trying to create a small script that iterates through a dictionary, collects each taxID, and, then, searches the protein-seq-file against this taxID-organism. When I try this code (without iteration) using a txid in the entrez_query attribute it usually works, but when using the dictionary like in this script below the final .txt file turns out to be empty. Does anyone have an idea why? Any help is welcome!
a glimpse of my multiple_aa.faa:
sp|Q9FW44|ADR1_ARATH Disease resistance protein ADR1 OS=Arabidopsis thaliana OX=3702 GN=ADR1 PE=2 SV=2 MASFIDLFAGDITTQLLKLLALVANTVYSCKGIAERLITMIRDVQPTIREIQYSGAELSN HHQTQLGVFYEILEKARKLCEKVLRCNRWNLKHVYHANKMKDLEKQISRFLNSQILLFVL AEVCHLRVNGDRIERNMDRLLTERNDSLSFPETMMEIETVSDPEIQTVLELGKKKVKEMM FKFTDTHLFGISGMSGSGKTTLAIELSKDDDVRGLFKNKVLFLTVSRSPNFENLESCIRE FLYDGVHQRKLVILDDVWTRESLDRLMSKIRGSTTLVVSRSKLADPRTTYNVELLKKDEA MSLLCLCAFEQKSPPSPFNKYLVKQVVDECKGLPLSLKVLGASLKNKPERYWEGVVKRLL RGEAADETHESRVFAHMEESLENLDPKIRDCFLDMGAFPEDKKIPLDLLTSVWVERHDID EETAFSFVLRLADKNLLTIVNNPRFGDVHIGYYDVFVTQHDVLRDLALHMSNRVDVNRRE RLLMPKTEPVLPREWEKNKDEPFDAKIVSLHTGEMDEMNWFDMDLPKAEVLILNFSSDNY VLPPFIGKMSRLRVLVIINNGMSPARLHGFSIFANLAKLRSLWLKRVHVPELTSCTIPLK NLHKIHLIFCKVKNSFVQTSFDISKIFPSLSDLTIDHCDDLLELKSIFGITSLNSLSITN CPRILELPKNLSNVQSLERLRLYACPELISLPVEVCELPCLKYVDISQCVSLVSLPEKFG KLGSLEKIDMRECSLLGLPSSVAALVSLRHVICDEETSSMWEMVKKVVPELCIEVAKKCF TVDWLDD
sp|Q9FKZ1|DRL42_ARATH Probable disease resistance protein At5g66900 OS=Arabidopsis thaliana OX=3702 GN=At5g66900 PE=3 SV=1 MNDWASLGIGSIGEAVFSKLLKVVIDEAKKFKAFKPLSKDLVSTMEILFPLTQKIDSMQK ELDFGVKELKELRDTIERADVAVRKFPRVKWYEKSKYTRKIERINKDMLKFCQIDLQLLQ HRNQLTLLGLTGNLVNSVDGLSKRMDLLSVPAPVFRDLCSVPKLDKVIVGLDWPLGELKK RLLDDSVVTLVVSAPPGCGKTTLVSRLCDDPDIKGKFKHIFFNVVSNTPNFRVIVQNLLQ HNGYNALTFENDSQAEVGLRKLLEELKENGPILLVLDDVWRGADSFLQKFQIKLPNYKIL VTSRFDFPSFDSNYRLKPLEDDDARALLIHWASRPCNTSPDEYEDLLQKILKRCNGFPIV IEVVGVSLKGRSLNTWKGQVESWSEGEKILGKPYPTVLECLQPSFDALDPNLKECFLDMG SFLEDQKIRASVIIDMWVELYGKGSSILYMYLEDLASQNLLKLVPLGTNEHEDGFYNDFL VTQHDILRELAICQSEFKENLERKRLNLEILENTFPDWCLNTINASLLSISTDDLFSSKW LEMDCPNVEALVLNLSSSDYALPSFISGMKKLKVLTITNHGFYPARLSNFSCLSSLPNLK RIRLEKVSITLLDIPQLQLSSLKKLSLVMCSFGEVFYDTEDIVVSNALSKLQEIDIDYCY DLDELPYWISEIVSLKTLSITNCNKLSQLPEAIGNLSRLEVLRLCSSMNLSELPEATEGL SNLRFLDISHCLGLRKLPQEIGKLQNLKKISMRKCSGCELPESVTNLENLEVKCDEETGL LWERLKPKMRNLRVQEEEIEHNLNLLQMF
dic_tx = {"nicotiana":'"(txid4097[ORGN])"',"grapevine":'"(txid:29760[ORGN])"',"almond":'"(txid:3755[ORGN])"',"apple":'"(txid:3750[ORGN])"',"citrus":'"(txid:2711[ORGN])"',"coffee":'"(txid:13443[ORGN])"', "olive":'"(txid:4146[ORGN])"'}
for k,v in dic_tx.items():
print(k)
print(v)
Entrez.email = '...@...'
list_record_host = []
for record in SeqIO.parse("multiple_aa.faa", format="fasta"):
print(record.id)
# print(record.seq)
# online request
try:
result_handle = NCBIWWW.qblast("blastp","nr", record.format("fasta"),entrez_query=v, hitlist_size=1)
print(result_handle)
except HTTPError:
time.sleep(5)
result_handle = NCBIWWW.qblast("blastp","nr", record.format("fasta"),entrez_query=v, hitlist_size=1)
# result handle stored in a list
list_record_host.append(result_handle)
result_handle_list_host = open("%s.xml" % k, "w")
for item in list_record_host:
result_handle_list_host.write(item.read())
result_handle_list_host.close()
# result_handle_list_host
reopen_result_handle = "%s.xml" % k
blast_records = NCBIXML.parse(open(reopen_result_handle))
save_file = open("%s_NLR.txt" % k, 'w')
for blast_record in blast_records:
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
save_file.write('>%s\n' % (alignment.title,))
#here possibly to output something to file, between each blast_record
save_file.close()