Question: Biopython Parsing Of Blast Results - String Format Issue
gravatar for Zach Powers
8.9 years ago by
Zach Powers340
Zach Powers340 wrote:

Hi Biostar,

I have two fasta files that I have blasted against one another and I am trying to make a list (dicitonary) of the top hits in a simple format of Query:Hit using Biopython. I am running into a an error, however, with the string's format. Here is an the script:

blast_records = NCBIXML.parse(open(outfile))
for blast_record in blast_records:
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:

and here is an example dictionary entry where the u'' surrounds the value:

u'HA9WEQA08JTIW5': u'gnl|BL_ORD_ID|100 PhosphataseA'

however if I use the print command the values appear correct:

print alignment.title
gnl|BL_ORD_ID|100 PhosphataseA

I am sure this is a simple problem and results in my lack of understanding of precisely how Biopython stores its information. But any suggestions would be appreciated.

thanks zach cp

Edit *** as per DK's answer I ended up using this formulation where I split the output and keep the gene name:

 test_dictionary.update[str(blast_record.query)] = str(alignment.title).split()[1]
biopython • 2.6k views
ADD COMMENTlink modified 8.9 years ago by Damian Kao15k • written 8.9 years ago by Zach Powers340

The strange u thing is to mark a Unicode string in Python 2

ADD REPLYlink written 8.9 years ago by Peter5.8k

You might not want to split it like that. If the gene name is multiple words, you'll only get the first word. I've edited my post to get just the gene name.

ADD REPLYlink written 8.9 years ago by Damian Kao15k
gravatar for Damian Kao
8.9 years ago by
Damian Kao15k
Damian Kao15k wrote:

I am not exactly sure what the problem is. Are you saying the entries that are getting inserted into the dictionary is showing up with u'' surrounding the string?

If you want a string representation of anything in biopython to be saved, always cast it as a String just to be safe.

So instead of:


do this:


or you can really just do this:

test_dictionary[str(blast_record.query)] = str(alignment.title)

To get just the gene name:

test_dictionary[str(blast_record.query)] = ' '.join(str(alignment.title).split()[1:])
ADD COMMENTlink modified 8.9 years ago • written 8.9 years ago by Damian Kao15k

thanks DK. I am learning some basic programming backwards by writing scripts that are getting progressively better. Sometimes the things that are obvious to others are tough to figure out.

ADD REPLYlink written 8.9 years ago by Zach Powers340
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 878 users visited in the last hour