Question: Extracting homologous proteins from genome ( blat or exonerate)
0
gravatar for ricardoguerreiro2121
4 days ago by
Germany
ricardoguerreiro212130 wrote:

Hi,

I would like to quickly extract proteins from various novel plant genomes, by finding homology with documented proteins (ex: A. thaliana), for the purpose of phylogenetic analysis.

A recent paper works with an old tool, Blat, that does just that. But the results of blat are a table of hits (with coordinates). How do transform this into proteins? I have created a script that parses my query DNA sequence based on the hit coordinates, but this doesn't seem ideal, I would have to translate the DNA there are 6 diferent ways of translating..

Does anyone know blat here? Or any nice easy alternative? Exonerate seems to do the same and also outputs alignments against my putative translated proteins, but I don't know how to extract anything from this format..

EDIT: I'm getting close to it with:

exonerate --model  protein2genome  araport_genes.pep.fasta b_repanda.fasta    --showalignment no --showvulgar no --ryo ">%ti (%tab - %tae)\n%tas\n"

Cheers, Ricardo

ADD COMMENTlink modified 3 days ago • written 4 days ago by ricardoguerreiro212130
1
gravatar for JC
4 days ago by
JC9.3k
Mexico
JC9.3k wrote:

From the paper mentioned:

Contig identity was assigned with Blat v.35 using translated DNA against the respective exon reference sets, selecting the highest scoring hit, and contigs with score > 20 and percentage identity > 75% were retained

The author didn't align the nucleotides from the genome, they translated the contigs translating it to the respective proteins.

For your analysis, I think you can annotate your sequences using the closest species, then use Ensembl Plants to retrieve the phylogenetic group and add your sequence to extend the phylogeny

ADD COMMENTlink written 4 days ago by JC9.3k
0
gravatar for ricardoguerreiro2121
3 days ago by
Germany
ricardoguerreiro212130 wrote:

I think I have found my ideal answer:

Run exonerate

Then in Python:

qresult = SearchIO.parse("exonerate_outfile", 'exonerate-text')

for i in qresult:
    hsp = i[0][0]    

    print("".join(list(hsp.hit_all[0])))
ADD COMMENTlink written 3 days ago by ricardoguerreiro212130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1533 users visited in the last hour