Question: python script modifications to get tabular output of blast from DEG(database of essential genes)
0
gravatar for ppnana
2.8 years ago by
ppnana0
ppnana0 wrote:

I have a script available in github which parses blast output from xml to tabular https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/blastxml_to_tabular.py but when i run the script it gives the following error::::

*Traceback (most recent call last):
  File "blastxml.py", line 328, in <module>
    convert(in_file, outfile)
  File "blastxmlexcel_sacred.py", line 184, in convert
    if re_default_query_id.match(qseqid):
TypeError: expected string or buffer*

The only is a difference in two blast results is that normal blast has these columns :

<Iteration>
  <Iteration_iter-num>1</Iteration_iter-num>
  <Iteration_query-ID>Query_15661</Iteration_query-ID>
  <Iteration_query-def>gi|927988967|gb|ALE41209.1| GDP-mannose 4,6-dehydratase [mycobacterium]</Iteration_query-def>
  <Iteration_query-len>340</Iteration_query-len>
<Iteration_hits>

but my blast result instead have

<Iteration>
      <Iteration_iter-num>1</Iteration_iter-num>
    <Iteration_hits>

Kindly suggest the changes required to be done

thanks

ADD COMMENTlink modified 2.8 years ago by jonasmst300 • written 2.8 years ago by ppnana0

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 2.8 years ago by WouterDeCoster42k

thanks WouterDeCoster

ADD REPLYlink written 2.8 years ago by ppnana0
1
gravatar for jonasmst
2.8 years ago by
jonasmst300
Norway/Oslo
jonasmst300 wrote:

That error is thrown when matching regular expressions with the re module. It's complaining that qseqid is not a string, so I'm assuming it's an int (a number, e.g. 42). Now it doesn't seem like you've linked the right script, as line 328 in the one you linked to is a commented-out print statement, which would not throw an error.

Wherever appropriate, you need to sanity check that qseqid is indeed a string. A crude way to do so is by

qseqid = str(qseqid)

before the call to match().

EDIT: Here's what I think is happening.

qseqid is read from an XML-file you're providing, and is taken from a tag called Iteration_query-ID:

qseqid = elem.findtext("Iteration_query-ID")

Usually, that's some value like Query_15661 (as you provided in your question). Now, since your blast results don't have that tag, I'm guessing

qseqid = elem.findtext("Iteration_query-ID")

returns None. And None is not a "String or buffer" so the call to match() fails:

>>> import re
>>> p = re.compile("ab*")
>>> p.match(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: expected string or buffer

EDIT 2: To conclude, the code you are using does not support your BLAST results. You'll either have to use something else, or modify the code to be robust against lacking tags, or just insert dummy-values for the tags in your BLAST results. I'm not that familiar with BLAST, let alone what you're trying to do here, so I can't tell you which solution is better.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by jonasmst300
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2181 users visited in the last hour