Question

Difference between NCBI Blast and Bio.Blast

0

Entering edit mode

5.0 years ago

saingamp ▴ 10

Hello, I am trying to use Bio.Blast to perfrom blast similar to NCBI web-based blast. I got code from previous post and revised some to output the data in excel spreadsheet. I tried to have info of coverage and identity as well, which is why I have parameters of query_length, hsp.identities, and hsp.align_length.

from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML 
fasta_files = ['Prakit_EN117_981R_G05']
for i in fasta_files:
    import csv
    csvData = [['ID','score','bits','E-score','hsp.identities','hsp.align_length','query_length']]
    with open (i+'.csv','wb') as csvFile:
         writer = csv.writer(csvFile)
         writer.writerows(csvData)
    csvFile.close()
    fasta_string = open(i+".fasta").read()
    query=''.join(fasta_string.split('\n')[1:])
    query_length = len(query)
    result_handle = NCBIWWW.qblast("blastn", "nr", fasta_string)
    with open(i+".xml","w")as out_handle:
        out_handle.write(result_handle.read())
    result_handle.close()
    result_handle = open(i+".xml")
    blast_record = NCBIXML.read(result_handle) 
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            cal_pidentity = float(hsp.identities/hsp.align_length)
            import csv
            row = [alignment.title,str(hsp.score),str(hsp.bits),str(hsp.expect),str(hsp.identities),str(hsp.align_length),str(query_length)]
            with open(i+'.csv','ab') as csvFile:
                writer = csv.writer(csvFile)
                writer.writerow(row)
            csvFile.close()

I successfully obtained the output. However, it raises many questions.

1.) When I further use hsp.identities, hsp.align_lenth, and query_length to manually calculated coverage and percent identity, there was some hit showing >100%coverage.

2.) I dont understand how the Bio.Blast sort the results because in my excel spreadsheet, some hits at the lower rows show higher calculated percent identity than upper rows.

3.) In overall, the results in my excel spreadsheet looks a bit different from results from NCBI web-based, using the same query.

How can I solve this problem?

Thank you so much

Biopython Blast • 1.5k views

ADD COMMENT • link updated 5.0 years ago by WouterDeCoster 47k • written 5.0 years ago by saingamp ▴ 10

0

Entering edit mode

can you confirm that you use the same database and param settings as the only NBCI blast?

inserting code is simply done by pasting it in your post, then select it and format it with the icon on top of the edit window: enter image description here

However if it's large amount of code it might be more useful (advisable) to use something like code-snippets from github gist for instance , some more info here

revised some to output the data in excel spreadsheet

why ... but why? :-)

ADD REPLY • link 5.0 years ago by lieven.sterck 15k

1

Entering edit mode

Hello lieven.sterck, I just found out that I can export .xml file from NCBI web-based too. Then, I compared details of parameters recorded in .xml files of NCBI web-based and the Bio.Blast I ran. They were different (i.e. match, mismatch, gap-open, gap-extend). Probably, that is reason why it is different?

ADD REPLY • link 5.0 years ago by saingamp ▴ 10

0

Entering edit mode

Hello lieven.sterck 1.) Thank you for the tips, I am looking into it now to revise the post. 2.) I only confirmed that they are the same database and same query but I didnt check other param settings yet ( I actually dont know how to). Could you guide me how to? 3.) I want to output the excel spreadsheet for conveniences in further process. I checked NCBI web-based, they have option to export spreadsheet but the hit title is not included.

Thank you so much

ADD REPLY • link 5.0 years ago by saingamp ▴ 10

0

Entering edit mode

On web blast page any blast algorithm you use has an expandable section towards the bottom of the screen that says Algorithm Parameters. Expand that section and make a note of defaults blast is using. You can then use them locally to reproduce web results.

While it is fine to export blast results as tab- or comma-limited value data using excel to analyze them is fraught with problems. If someone just needs to "view" that data then that application may be fine.

ADD REPLY • link 5.0 years ago by GenoMax 141k

0

Entering edit mode

Hello, Thank you so much. I have one question. It seems like NCBI web-based blast using algorithm based on the article different from the one used in Bio.Blast, in that case, can we change the algorithm to be according to the NCBI web-based?

ADD REPLY • link 5.0 years ago by saingamp ▴ 10

0

Entering edit mode

Blast+ algorithms should be identical (i.e. if you are referring to blastn).

ADD REPLY • link 5.0 years ago by GenoMax 141k

0

Entering edit mode

Why doesn’t Bio.Blast.NCBIWWW.qblast() give the same results as the NCBI BLAST website?

You need to specify the same options – the NCBI often adjust the default settings on the website, and they do not match the >QBLAST defaults anymore. Check things like the gap penalties and expectation threshold.

The BioPython cookbook FAQ #19 suggests the above, which is what Lieven was getting at.

ADD REPLY • link 5.0 years ago by Joe 21k