Question: Python SearchIO: Extracting information from QueryResults?
0
gravatar for westin.kosater
4 months ago by
westin.kosater10 wrote:

I performed a BLAST search of a fasta file with multiple sequences using python. What I want to do now is to extract information and put it in a pandas dataframe. I want the query ID, the hit ID, and the accession number of the hit. Here's what I've done so far:

fasta_string = open("list.fasta").read()
result_handle = NCBIWWW.qblast("blastx", sequence = fasta_string, database = "refseq_protein",
                               entrez_query = 'txid9606[ORGN]')

with open("my_blast.xml", 'w') as out_handle:
    out_handle.write(result_handle.read())
    result_handle.close()

result_handle = open("my_blast.xml")

blast_records = NCBIXML.parse(result_handle)

qresults = SearchIO.parse('my_blast.xml', 'blast-xml')

search_dict = SearchIO.to_dict(qresults)
query_id = []
hit_list = []

tsv_output = pd.DataFrame(query_id) #Initialize pandas dataframe

for key, value in search_dict.items():
    query_id.append(key)
    hit_list.append(value)

I already added the Query ID to the pandas dataframe, now I'm looking to find some way to extract the ID of every result in hit_list, which is a list of QueryResults. I've looked through the documentation (https://biopython.org/DIST/docs/api/Bio.SearchIO._model.query.QueryResult-class.html), but I don't see any way to extract the hit ID or the hit accession number. Does anyone know how I could do this?

Thank you

xml blast python • 160 views
ADD COMMENTlink written 4 months ago by westin.kosater10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1644 users visited in the last hour