Question: Blast parse script: ValueError: Required query and/or hit ID field not found.
0
gravatar for bioinfoSeeker
8 months ago by
United Kingdom
bioinfoSeeker20 wrote:

Hi, I am trying to write a script to parse and summarize the ncbi web blast output (14 columns). I am using biopython searchIO on the tab delimited output file which is commented by default. I get the error ValueError: Required query and/or hit ID field not found.

I tried to remove the further summarizing openpyxl part of the script and simply run the barebones with searchIO. But the error persists.

file = "out.txt"
blast_generator = SearchIO.parse(file, 'blast-tab', comments=True)
for blast_qresult in blast_generator:
    print blast_qresult
    for k,blast_hit in enumerate(blast_qresult):
        print k
        print blast_hit
        query = blast_qresult.id
        print query

I then tried to repeat the script by adding the column headers exactly and verbatim as present in the blast report.

 custom_fields = 'query id, subject ids, query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score'
 blast_generator = SearchIO.parse(file, 'blast-tab', fields=custom_fields, comments=True)

I still get the same error.

Can you please advice me where I'm going wrong?

blast python • 374 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by bioinfoSeeker20

In continuation to my post, i'm adding a snippet of my blast report file that I'm trying to parse.

# blastn                                                    
# Iteration: 0                                                  
# Query: 000000577|size:2297896                                                 
# RID: xxxxxx
# Database: nr                                                  
# Fields: query id, subject ids, query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score                                             
# 104 hits found                                                    
000000577|size:2297896  gi|1213423373|gb|CP022124.1|    000000577|size:2297896  CP022124.1  100 252 0   0   1   252 198071  197820  1.30E-127   466
000000577|size:2297896  gi|1213423373|gb|CP022124.1|    000000577|size:2297896  CP022124.1  100 252 0   0   1   252 534111  533860  1.30E-127   466
000000577|size:2297896  gi|1213423373|gb|CP022124.1|    000000577|size:2297896  CP022124.1  100 252 0   0   1   252 725655  725906  1.30E-127   466
000000577|size:2297896  gi|1213423373|gb|CP022124.1|    000000577|size:2297896  CP022124.1  100 252 0   0   1   252 814744  814995  1.30E-127   466
000000577|size:2297896  gi|1213423373|gb|CP022124.1|    000000577|size:2297896  CP022124.1  100 252 0   0   1   252 1178261 1178512 1.30E-127   466
000000577|size:2297896  gi|1188266913|dbj|LC268264.1|   000000577|size:2297896  LC268264.1  100 252 0   0   1   252 156 407 1.30E-127   466
000000577|size:2297896  gi|1188266138|dbj|LC268202.1|   000000577|size:2297896  LC268202.1  100 252 0   0   1   252 156 407 1.30E-127   466
000000577|size:2297896  gi|1188263076|dbj|LC264101.1|   000000577|size:2297896  LC264101.1  100 252 0   0   1   252 156 407 1.30E-127   466
000000577|size:2297896  gi|1188262847|dbj|LC264912.1|   000000577|size:2297896  LC264912.1  100 252 0   0   1   252 156 407 1.30E-127   466
ADD REPLYlink modified 8 months ago • written 8 months ago by bioinfoSeeker20
0
gravatar for bioinfoSeeker
8 months ago by
United Kingdom
bioinfoSeeker20 wrote:

I looked at SearchIO documentation http://biopython.org/DIST/docs/api/Bio.SearchIO.BlastIO-pysrc.html, and used to the table in there to replace the corresponding fields from the blast report, in the script.

custom_fields = 'qseqid, sseqid, qaccver, sacc_ver, nident, length, mismatch, gapopen, qstart, qend, sstart, send, evalue, score'

This still gave the same error. I then remove the commas, and kept the space to see if that helps.

custom_fields = 'qseqid sseqid qaccver sacc_ver nident length mismatch gapopen qstart qend sstart send evalue score'

This resolved the error when printing QueryResult. But I now instead get error "AttributeError: 'Hit' object has no attribute 'sacc_ver' " when I parse this QueryResult and Hit, as the headers now don't match. I will add that error as a separate post.

ADD COMMENTlink written 8 months ago by bioinfoSeeker20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1332 users visited in the last hour