Question: Difference between NCBI Blast and Bio.Blast
0
gravatar for saingamp
12 months ago by
saingamp10
saingamp10 wrote:

Hello, I am trying to use Bio.Blast to perfrom blast similar to NCBI web-based blast. I got code from previous post and revised some to output the data in excel spreadsheet. I tried to have info of coverage and identity as well, which is why I have parameters of query_length, hsp.identities, and hsp.align_length.

from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML 
fasta_files = ['Prakit_EN117_981R_G05']
for i in fasta_files:
    import csv
    csvData = [['ID','score','bits','E-score','hsp.identities','hsp.align_length','query_length']]
    with open (i+'.csv','wb') as csvFile:
         writer = csv.writer(csvFile)
         writer.writerows(csvData)
    csvFile.close()
    fasta_string = open(i+".fasta").read()
    query=''.join(fasta_string.split('\n')[1:])
    query_length = len(query)
    result_handle = NCBIWWW.qblast("blastn", "nr", fasta_string)
    with open(i+".xml","w")as out_handle:
        out_handle.write(result_handle.read())
    result_handle.close()
    result_handle = open(i+".xml")
    blast_record = NCBIXML.read(result_handle) 
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            cal_pidentity = float(hsp.identities/hsp.align_length)
            import csv
            row = [alignment.title,str(hsp.score),str(hsp.bits),str(hsp.expect),str(hsp.identities),str(hsp.align_length),str(query_length)]
            with open(i+'.csv','ab') as csvFile:
                writer = csv.writer(csvFile)
                writer.writerow(row)
            csvFile.close()

I successfully obtained the output. However, it raises many questions.

1.) When I further use hsp.identities, hsp.align_lenth, and query_length to manually calculated coverage and percent identity, there was some hit showing >100%coverage.

2.) I dont understand how the Bio.Blast sort the results because in my excel spreadsheet, some hits at the lower rows show higher calculated percent identity than upper rows.

3.) In overall, the results in my excel spreadsheet looks a bit different from results from NCBI web-based, using the same query.

How can I solve this problem?

Thank you so much

blast biopython • 400 views
ADD COMMENTlink modified 12 months ago by WouterDeCoster43k • written 12 months ago by saingamp10

can you confirm that you use the same database and param settings as the only NBCI blast?

inserting code is simply done by pasting it in your post, then select it and format it with the icon on top of the edit window: enter image description here

However if it's large amount of code it might be more useful (advisable) to use something like code-snippets from github gist for instance , some more info here

revised some to output the data in excel spreadsheet

why ... but why? :-)

ADD REPLYlink modified 12 months ago • written 12 months ago by lieven.sterck7.8k
1

Hello lieven.sterck, I just found out that I can export .xml file from NCBI web-based too. Then, I compared details of parameters recorded in .xml files of NCBI web-based and the Bio.Blast I ran. They were different (i.e. match, mismatch, gap-open, gap-extend). Probably, that is reason why it is different?

ADD REPLYlink written 12 months ago by saingamp10

Hello lieven.sterck 1.) Thank you for the tips, I am looking into it now to revise the post. 2.) I only confirmed that they are the same database and same query but I didnt check other param settings yet ( I actually dont know how to). Could you guide me how to? 3.) I want to output the excel spreadsheet for conveniences in further process. I checked NCBI web-based, they have option to export spreadsheet but the hit title is not included.

Thank you so much

ADD REPLYlink modified 12 months ago • written 12 months ago by saingamp10

On web blast page any blast algorithm you use has an expandable section towards the bottom of the screen that says Algorithm Parameters. Expand that section and make a note of defaults blast is using. You can then use them locally to reproduce web results.

Capture

While it is fine to export blast results as tab- or comma-limited value data using excel to analyze them is fraught with problems. If someone just needs to "view" that data then that application may be fine.

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax83k

Hello, Thank you so much. I have one question. It seems like NCBI web-based blast using algorithm based on the article different from the one used in Bio.Blast, in that case, can we change the algorithm to be according to the NCBI web-based?

ADD REPLYlink written 12 months ago by saingamp10

Blast+ algorithms should be identical (i.e. if you are referring to blastn).

ADD REPLYlink written 12 months ago by genomax83k

Why doesn’t Bio.Blast.NCBIWWW.qblast() give the same results as the NCBI BLAST website?

You need to specify the same options – the NCBI often adjust the default settings on the website, and they do not match the >QBLAST defaults anymore. Check things like the gap penalties and expectation threshold.

The BioPython cookbook FAQ #19 suggests the above, which is what Lieven was getting at.

ADD REPLYlink modified 12 months ago • written 12 months ago by Joe16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1915 users visited in the last hour