when running alignment using tblastn via biopython, the subject string that is returned is the translated version of the DNA strand we blasted against. Is there a way to return the original DNA string, instead of the translated version?
when running alignment using tblastn via biopython, the subject string that is returned is the translated version of the DNA strand we blasted against. Is there a way to return the original DNA string, instead of the translated version?
This is a limitation of the BLAST XML output itself: it doesn't keep the original sequence. Biopython only parses this output into a user-friendly data structure. Without any information regarding the original sequence in the BLAST XML file, the original sequence couldn't be returned.
You could reverse translate the given protein sequence. But given the codon redundancy, it may be impossible to figure out the original DNA sequence from the BLAST XML file alone.
Unless you have a quite small database FASTA file, rather than an in memory index with Bio.SeqIO.to_dict()
, probably Bio.SeqIO.index()
or Bio.SeqIO.index_db()
would be more sensible.
Or, and this is a good plan if don't have a FASTA file of the database, you could use blastdbcmd - although that isn't always as easy as it should be: http://blastedbio.blogspot.co.uk/2012/10/my-ids-not-good-enough-for-ncbi-blast.html
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Could you explain a little more in depth what problem you are experiencing? What were your inputs and what outputs did you expect from biopython?