Question: A Doubt About Gene Prediction
0
gravatar for Ontheway
5.7 years ago by
Ontheway10
Ontheway10 wrote:

Hi, I used glimmer to prediction orfs in a draft genome, and I found that some orf had a insertion or deletion with its homologous genes in other genomes. But I used these homologous genes to search the draft genome sequence and got a complete match with total length. So, the predicted orf is wrong? or the predicted needs to be tuned again. how to get a credible gene prediction results? Thank you for your reply.

gene • 1.7k views
ADD COMMENTlink modified 2.8 years ago by Biostar ♦♦ 20 • written 5.7 years ago by Ontheway10
2

Need more details: How did you do the search, which blast program (blastp, tblastx, DNA, AA database), is it a prokaryote?
It might be that there are multiple copies of your test set of genes in your draft genome and one has a frame-shift. Use tblastx to detect those. (also glimmer doesn't "predict ORFs" ORFs do not need to be predicted, it predicts whether ORFs are protein-coding or not)

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Michael Dondrup46k

I use the gene sequence to search the genome sequence(blastn), and the genome is a prokaryotic genome. there are no copies of the test genes.The position of orf is loacated in the region of alignment in the genome. If I use the protein sequence to search the genome and get a local alignment result, can I conclude that this genome has the protein-coding gene? Thank you!

ADD REPLYlink written 5.7 years ago by Ontheway10

So, when you blastn DNA sequence of gene A (draft) against B, you get an insertion or deletion but when you blast B against A with the same parameters you do not get any with the same coordinates? Is that what you are trying to tell? That would worry me slightly, however it is most likely not true. could you post an example?

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Michael Dondrup46k

there is a complete genome A with a gene a(100bp); there is a draft genome B with a predicted orf b(90bp); when i align gene a with orf b, there is a 10bp deletion in the N terminus of orf b; ('*' mens match, and '-' means deletion)

a: ***********************************************************
b: -----*******************************************************

when I use gene a to search the genome B, there exists an alignment with total length.

a: ***********************************************************
B: ***********************************************************
ADD REPLYlink modified 5.7 years ago by Michael Dondrup46k • written 5.7 years ago by Ontheway10

I think you have been tricked by the alignment heuristic, that somehow doesn't score the more complete alignment better than the incomplete one. You can use SSearch or EMBOS water if in doubt, but I wouldn't be worried.

ADD REPLYlink written 5.7 years ago by Michael Dondrup46k
3
gravatar for Bill Pearson
5.7 years ago by
Bill Pearson860
Bill Pearson860 wrote:

This is a well understood problem with using ORF finders like glimmer (and pretty much anything else) on data that is likely to have insertion/deletion (frameshift) errors. When FASTX was published in 1997 (Pearson et al, (1997) Genomics 46:24-36), we showed that there were many genes in a recently sequenced bacterial genome that could be extended by alignment with frameshifts.

I would argue that there is no reason to look for open reading frames. FASTX (and BLASTX, but you need to turn on the option that allows frame-shifts) will find all the genes you can find with ORF-finders, and more (because they are not limited to a minimum ORF length).

ADD COMMENTlink written 5.7 years ago by Bill Pearson860
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1799 users visited in the last hour