Hello, I am trying to use Bio.Blast to perfrom blast similar to NCBI web-based blast. I got code from previous post and revised some to output the data in excel spreadsheet. I tried to have info of coverage and identity as well, which is why I have parameters of query_length, hsp.identities, and hsp.align_length.
from Bio.Blast import NCBIWWW from Bio.Blast import NCBIXML fasta_files = ['Prakit_EN117_981R_G05'] for i in fasta_files: import csv csvData = [['ID','score','bits','E-score','hsp.identities','hsp.align_length','query_length']] with open (i+'.csv','wb') as csvFile: writer = csv.writer(csvFile) writer.writerows(csvData) csvFile.close() fasta_string = open(i+".fasta").read() query=''.join(fasta_string.split('\n')[1:]) query_length = len(query) result_handle = NCBIWWW.qblast("blastn", "nr", fasta_string) with open(i+".xml","w")as out_handle: out_handle.write(result_handle.read()) result_handle.close() result_handle = open(i+".xml") blast_record = NCBIXML.read(result_handle) for alignment in blast_record.alignments: for hsp in alignment.hsps: cal_pidentity = float(hsp.identities/hsp.align_length) import csv row = [alignment.title,str(hsp.score),str(hsp.bits),str(hsp.expect),str(hsp.identities),str(hsp.align_length),str(query_length)] with open(i+'.csv','ab') as csvFile: writer = csv.writer(csvFile) writer.writerow(row) csvFile.close()
I successfully obtained the output. However, it raises many questions.
1.) When I further use hsp.identities, hsp.align_lenth, and query_length to manually calculated coverage and percent identity, there was some hit showing >100%coverage.
2.) I dont understand how the Bio.Blast sort the results because in my excel spreadsheet, some hits at the lower rows show higher calculated percent identity than upper rows.
3.) In overall, the results in my excel spreadsheet looks a bit different from results from NCBI web-based, using the same query.
How can I solve this problem?
Thank you so much