Question: Python SearchIO: Extracting information from QueryResults?
gravatar for westin.kosater
7 months ago by
westin.kosater10 wrote:

I performed a BLAST search of a fasta file with multiple sequences using python. What I want to do now is to extract information and put it in a pandas dataframe. I want the query ID, the hit ID, and the accession number of the hit. Here's what I've done so far:

fasta_string = open("list.fasta").read()
result_handle = NCBIWWW.qblast("blastx", sequence = fasta_string, database = "refseq_protein",
                               entrez_query = 'txid9606[ORGN]')

with open("my_blast.xml", 'w') as out_handle:

result_handle = open("my_blast.xml")

blast_records = NCBIXML.parse(result_handle)

qresults = SearchIO.parse('my_blast.xml', 'blast-xml')

search_dict = SearchIO.to_dict(qresults)
query_id = []
hit_list = []

tsv_output = pd.DataFrame(query_id) #Initialize pandas dataframe

for key, value in search_dict.items():

I already added the Query ID to the pandas dataframe, now I'm looking to find some way to extract the ID of every result in hit_list, which is a list of QueryResults. I've looked through the documentation (, but I don't see any way to extract the hit ID or the hit accession number. Does anyone know how I could do this?

Thank you

xml blast python • 232 views
ADD COMMENTlink written 7 months ago by westin.kosater10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1566 users visited in the last hour