Extracting homologous proteins from genome ( blat or exonerate)
Entering edit mode
18 months ago


I would like to quickly extract proteins from various novel plant genomes, by finding homology with documented proteins (ex: A. thaliana), for the purpose of phylogenetic analysis.

A recent paper works with an old tool, Blat, that does just that. But the results of blat are a table of hits (with coordinates). How do transform this into proteins? I have created a script that parses my query DNA sequence based on the hit coordinates, but this doesn't seem ideal, I would have to translate the DNA there are 6 diferent ways of translating..

Does anyone know blat here? Or any nice easy alternative? Exonerate seems to do the same and also outputs alignments against my putative translated proteins, but I don't know how to extract anything from this format..

EDIT: I'm getting close to it with:

exonerate --model  protein2genome  araport_genes.pep.fasta b_repanda.fasta    --showalignment no --showvulgar no --ryo ">%ti (%tab - %tae)\n%tas\n"

Cheers, Ricardo

blat genome proteins phylogeny exonerate • 418 views
Entering edit mode
18 months ago
JC 12k

From the paper mentioned:

Contig identity was assigned with Blat v.35 using translated DNA against the respective exon reference sets, selecting the highest scoring hit, and contigs with score > 20 and percentage identity > 75% were retained

The author didn't align the nucleotides from the genome, they translated the contigs translating it to the respective proteins.

For your analysis, I think you can annotate your sequences using the closest species, then use Ensembl Plants to retrieve the phylogenetic group and add your sequence to extend the phylogeny

Entering edit mode
18 months ago

I think I have found my ideal answer:

Run exonerate

Then in Python:

qresult = SearchIO.parse("exonerate_outfile", 'exonerate-text')

for i in qresult:
    hsp = i[0][0]    


Login before adding your answer.

Traffic: 2270 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6