I would like to quickly extract proteins from various novel plant genomes, by finding homology with documented proteins (ex: A. thaliana), for the purpose of phylogenetic analysis.
A recent paper works with an old tool, Blat, that does just that. But the results of blat are a table of hits (with coordinates). How do transform this into proteins? I have created a script that parses my query DNA sequence based on the hit coordinates, but this doesn't seem ideal, I would have to translate the DNA there are 6 diferent ways of translating..
Does anyone know blat here? Or any nice easy alternative? Exonerate seems to do the same and also outputs alignments against my putative translated proteins, but I don't know how to extract anything from this format..
EDIT: I'm getting close to it with:
exonerate --model protein2genome araport_genes.pep.fasta b_repanda.fasta --showalignment no --showvulgar no --ryo ">%ti (%tab - %tae)\n%tas\n"