python script modifications to get tabular output of blast from DEG(database of essential genes)
1
0
Entering edit mode
4.5 years ago
ppnana • 0

I have a script available in github which parses blast output from xml to tabular https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/blastxml_to_tabular.py but when i run the script it gives the following error::::

*Traceback (most recent call last):
File "blastxml.py", line 328, in <module>
convert(in_file, outfile)
File "blastxmlexcel_sacred.py", line 184, in convert
if re_default_query_id.match(qseqid):
TypeError: expected string or buffer*


The only is a difference in two blast results is that normal blast has these columns :

<Iteration>
<Iteration_iter-num>1</Iteration_iter-num>
<Iteration_query-ID>Query_15661</Iteration_query-ID>
<Iteration_query-def>gi|927988967|gb|ALE41209.1| GDP-mannose 4,6-dehydratase [mycobacterium]</Iteration_query-def>
<Iteration_query-len>340</Iteration_query-len>
<Iteration_hits>


but my blast result instead have

<Iteration>
<Iteration_iter-num>1</Iteration_iter-num>
<Iteration_hits>


Kindly suggest the changes required to be done

thanks

python biopython script essential genes database • 1.4k views
0
Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

0
Entering edit mode

thanks WouterDeCoster

1
Entering edit mode
4.5 years ago
jonasmst ▴ 330

That error is thrown when matching regular expressions with the re module. It's complaining that qseqid is not a string, so I'm assuming it's an int (a number, e.g. 42). Now it doesn't seem like you've linked the right script, as line 328 in the one you linked to is a commented-out print statement, which would not throw an error.

Wherever appropriate, you need to sanity check that qseqid is indeed a string. A crude way to do so is by

qseqid = str(qseqid)


before the call to match().

EDIT: Here's what I think is happening.

qseqid is read from an XML-file you're providing, and is taken from a tag called Iteration_query-ID:

qseqid = elem.findtext("Iteration_query-ID")


Usually, that's some value like Query_15661 (as you provided in your question). Now, since your blast results don't have that tag, I'm guessing

qseqid = elem.findtext("Iteration_query-ID")


returns None. And None is not a "String or buffer" so the call to match() fails:

>>> import re
>>> p = re.compile("ab*")
>>> p.match(None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: expected string or buffer


EDIT 2: To conclude, the code you are using does not support your BLAST results. You'll either have to use something else, or modify the code to be robust against lacking tags, or just insert dummy-values for the tags in your BLAST results. I'm not that familiar with BLAST, let alone what you're trying to do here, so I can't tell you which solution is better.