Question: Python SearchIO: Extracting information from QueryResults?
0
gravatar for westin.kosater
17 months ago by
westin.kosater20 wrote:

I performed a BLAST search of a fasta file with multiple sequences using python. What I want to do now is to extract information and put it in a pandas dataframe. I want the query ID, the hit ID, and the accession number of the hit. Here's what I've done so far:

fasta_string = open("list.fasta").read()
result_handle = NCBIWWW.qblast("blastx", sequence = fasta_string, database = "refseq_protein",
                               entrez_query = 'txid9606[ORGN]')

with open("my_blast.xml", 'w') as out_handle:
    out_handle.write(result_handle.read())
    result_handle.close()

result_handle = open("my_blast.xml")

blast_records = NCBIXML.parse(result_handle)

qresults = SearchIO.parse('my_blast.xml', 'blast-xml')

search_dict = SearchIO.to_dict(qresults)
query_id = []
hit_list = []

tsv_output = pd.DataFrame(query_id) #Initialize pandas dataframe

for key, value in search_dict.items():
    query_id.append(key)
    hit_list.append(value)

I already added the Query ID to the pandas dataframe, now I'm looking to find some way to extract the ID of every result in hit_list, which is a list of QueryResults. I've looked through the documentation (https://biopython.org/DIST/docs/api/Bio.SearchIO._model.query.QueryResult-class.html), but I don't see any way to extract the hit ID or the hit accession number. Does anyone know how I could do this?

Thank you

xml blast python • 544 views
ADD COMMENTlink written 17 months ago by westin.kosater20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 715 users visited in the last hour