Source Of Predicted Genes In Ensembl
1
2
Entering edit mode
13.2 years ago
Pals ★ 1.3k

Ensembl database contains three types of genes - known genes, novel genes and predicted genes. As we know Ensembl do not use the predicted proteins and mRNA transcripts (those begining with XP and XM respectively) in Genebuild. In that is the case how do predicted genes come from?

ensembl database gene • 4.7k views
ADD COMMENT
10
Entering edit mode
13.2 years ago

You're correct, that all Ensembl transcripts go back to a cDNA or protein (and no XP or XMs). The 'known' and 'novel' status is actually determined after the genebuild. Once the Ensembl transcript set has been "built", the transcripts are compared against scientific, public databases. If there is a sequence match to a protein or cDNA for the same species, the transcript is classified as "known". If there is no match, the transcript is classified as "novel". We also have a "known by projection" classification, which are Ensembl transcripts with a sequence match for another species. This classification is more common in species where not much cDNA or protein is available- a homology build had to occur.

I'm not sure where you are seeing "predictions". These might be our Genscan predictions. The genebuilders start with genscan predictions, which overpredict, in their initial alignments of cDNA and proteins to the genome. The genscan predictions are not used as supporting evidence, and do not lead to transcripts in the Ensembl gene sets. They are only there to support the annotation pipeline:

http://www.ensembl.org/info/docs/genebuild/genome_annotation.html

Where exactly in the database did you find them (can you show me your query)?

This type of question is great for our helpdesk (helpdesk[?]ensembl.org). Feel free to continue this discussion there, if you wish to send your script or query along that would help.

ADD COMMENT
1
Entering edit mode

Yes, that is a good interpretation- novel genes are predicted based on evidence from another species. Can I ask you what resource you were using (send me the link)? We would like to retire some of that outdated documentation!

ADD REPLY
0
Entering edit mode

Thank you very much for your detailed explanantion. I am sorry for misinterpreting the information by reviewing the old literature. I was reading an online resource as part of my course where I found those three types. But the ensembl link you provided the types as well as Genebuild method in general.

Now I came to conclude that Ensembl do make predictions but those predictions are based on existing ortholog proteins in other genes, so they actually result in "novel genes" not "predicted genes".

ADD REPLY
0
Entering edit mode

I had found it from my own web course and I will discuss regarding this with the teacher and tell him to update the resource.

Thank you once again.

ADD REPLY
0
Entering edit mode

Thank you for speaking with the teacher. We do have some tutorials here, if your teacher wants to use them: http://www.ensembl.org/info/website/tutorials/index.html

ADD REPLY
0
Entering edit mode

enter image description here

I think "predicted gene" in descriptions still are in genomes, other than Human (for instance in Mm like in the image). So as I understood for other species, we may have descriptions which includes known, novel and predicted genes, but in human we only have known and novel genes, as I checked.

ADD REPLY

Login before adding your answer.

Traffic: 1769 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6