Question: GENCODE versus Ensembl gene annotations
gravatar for igor
27 days ago by
United States
igor4.5k wrote:

What is the difference between GENCODE and Ensembl annotation? That's actually the first question in the GENCODE FAQs:

The GENCODE annotation is made by merging the Havana manual gene annotation and the Ensembl automated gene annotation. ...In practical terms, the GENCODE annotation is identical to the Ensembl annotation.

I am looking at the mouse data for GENCODE M15 compared to Ensembl 90, which should be comparable according to both source. Total number of transcripts is 131,100 vs 131,195, so that difference is negligible. However, some subsets are very different. The number of protein-coding genes is 21,950 vs 22,598, which is a little more noticeable. Long non-coding RNA genes is 11,975 vs 8,980, so more than 30% drop. Thus, it seems like annotation is not really identical. Are those differences real or are they just counting the gene biotypes differently?

gencode ensembl gtf • 216 views
ADD COMMENTlink modified 23 days ago by Astrid_Ensembl30 • written 27 days ago by igor4.5k
gravatar for Astrid_Ensembl
23 days ago by
Astrid_Ensembl30 wrote:

The Gencode statistics webpage refers to the annotation on the reference chromosomes only, which has 131,100 transcripts.

The Ensembl statistics include all primary assembly regions, which explains the higher number of transcripts (131,195).

Furthermore, the grouping of the gene biotypes for the statistics webpages differ between Gencode and Ensembl:

Gencode includes only the 21,950 genes with a "protein_coding" biotype under the "Protein-coding genes" category on the webpage. Ensembl reports in the "Coding genes" category all genes that contain an ORF, which adds the IG/TR genes for example.

In the case of the long non-coding RNA genes, the difference is due to the inclusion of the TEC genes by Gencode (~3000).

ADD COMMENTlink written 23 days ago by Astrid_Ensembl30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1412 users visited in the last hour