I have 2 fasta files containing all of the proteins from 2 distance organisms (A
Spirochaetes and a
Firmicutes). I want to map the genes from the
Firmicutes to it's best hit in the
What is the best way to do this and the most accepted way?
I'm very familiar with
Python and my first thought was to use
skbio and do a pairwise alignment for all of the proteins.(http://scikitbio.org/docs/0.4.1/generated/skbio.alignment.StripedSmithWaterman.html). However, since it's local alignment then it may give me a high score for a single domain which is not what I want.
I then thought about using
BioPython and the
blast wrapper but I don't know how to specify the query database and a length threshold (http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc87).