Question: Extracting homologous proteins from genome ( blat or exonerate)
gravatar for ricardoguerreiro2121
4 days ago by
ricardoguerreiro212130 wrote:


I would like to quickly extract proteins from various novel plant genomes, by finding homology with documented proteins (ex: A. thaliana), for the purpose of phylogenetic analysis.

A recent paper works with an old tool, Blat, that does just that. But the results of blat are a table of hits (with coordinates). How do transform this into proteins? I have created a script that parses my query DNA sequence based on the hit coordinates, but this doesn't seem ideal, I would have to translate the DNA there are 6 diferent ways of translating..

Does anyone know blat here? Or any nice easy alternative? Exonerate seems to do the same and also outputs alignments against my putative translated proteins, but I don't know how to extract anything from this format..

EDIT: I'm getting close to it with:

exonerate --model  protein2genome  araport_genes.pep.fasta b_repanda.fasta    --showalignment no --showvulgar no --ryo ">%ti (%tab - %tae)\n%tas\n"

Cheers, Ricardo

ADD COMMENTlink modified 3 days ago • written 4 days ago by ricardoguerreiro212130
gravatar for JC
4 days ago by
JC9.3k wrote:

From the paper mentioned:

Contig identity was assigned with Blat v.35 using translated DNA against the respective exon reference sets, selecting the highest scoring hit, and contigs with score > 20 and percentage identity > 75% were retained

The author didn't align the nucleotides from the genome, they translated the contigs translating it to the respective proteins.

For your analysis, I think you can annotate your sequences using the closest species, then use Ensembl Plants to retrieve the phylogenetic group and add your sequence to extend the phylogeny

ADD COMMENTlink written 4 days ago by JC9.3k
gravatar for ricardoguerreiro2121
3 days ago by
ricardoguerreiro212130 wrote:

I think I have found my ideal answer:

Run exonerate

Then in Python:

qresult = SearchIO.parse("exonerate_outfile", 'exonerate-text')

for i in qresult:
    hsp = i[0][0]    

ADD COMMENTlink written 3 days ago by ricardoguerreiro212130
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1533 users visited in the last hour