Question: Selecting primary transcript for each locus from a GFF file
0
gravatar for arnstrm
3.6 years ago by
arnstrm1.7k
Ames, IA
arnstrm1.7k wrote:

Hello,

I am planning to use Maker predicted genes for identifying orthologs among the closely related species. But Maker has predicted multiple transcripts for each locus (because of multiple gene predictors that were used in Maker as well as multiple isoforms for the genes). Although, I am using only predictions with AED scores <1.0, I still have many models for each locus. My question is, what is the best way to chose a transcript for a region? Should I select the longest coding sequence for that region? Are there any program that can perform this step?

Thanks for any help!

 

ADD COMMENTlink modified 3.6 years ago by h.mon24k • written 3.6 years ago by arnstrm1.7k
2
gravatar for h.mon
3.6 years ago by
h.mon24k
Brazil
h.mon24k wrote:

You could use EvidenceModeler to get a consensus prediction.

Another approach could be clustering orthologs using all predicted genes, then prune the clusters using some criterion (longest transcript is not necessarily the best). Agalma pipeline uses this later approach, though the paper do not details how this is performed (and Agalma is designed for primarily to RNAseq data sets).

ADD COMMENTlink written 3.6 years ago by h.mon24k
1
gravatar for Joseph Pearson
3.6 years ago by
UNC Chapel Hill
Joseph Pearson410 wrote:

Many genes will have multiple splice variants with identical CDSs, so that strategy might not be sufficient. You could use the mostly strongly expressed RNA (sort by gene, then gene expression, and filter for the best using awk or Excel), but that will frequently differ between tissues. In summary, there's a good reason why multiple transcripts exist; there is not one "best" transcript. That being said, most transcript variants will be substantially similar, so if you arbitrarily choose among mRNAs with similar evidence (functional genomics/transcriptomic data), you will be able to identify orthologs from the common regions of each transcript.

ADD COMMENTlink written 3.6 years ago by Joseph Pearson410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 765 users visited in the last hour