I performed a BLAST search of a fasta file with multiple sequences using python. What I want to do now is to extract information and put it in a pandas dataframe. I want the query ID, the hit ID, and the accession number of the hit. Here's what I've done so far:
fasta_string = open("list.fasta").read() result_handle = NCBIWWW.qblast("blastx", sequence = fasta_string, database = "refseq_protein", entrez_query = 'txid9606[ORGN]') with open("my_blast.xml", 'w') as out_handle: out_handle.write(result_handle.read()) result_handle.close() result_handle = open("my_blast.xml") blast_records = NCBIXML.parse(result_handle) qresults = SearchIO.parse('my_blast.xml', 'blast-xml') search_dict = SearchIO.to_dict(qresults) query_id =  hit_list =  tsv_output = pd.DataFrame(query_id) #Initialize pandas dataframe for key, value in search_dict.items(): query_id.append(key) hit_list.append(value)
I already added the Query ID to the pandas dataframe, now I'm looking to find some way to extract the ID of every result in
hit_list, which is a list of QueryResults. I've looked through the documentation (https://biopython.org/DIST/docs/api/Bio.SearchIO._model.query.QueryResult-class.html), but I don't see any way to extract the hit ID or the hit accession number. Does anyone know how I could do this?