Question: Gene name and ensembl ids
0
gravatar for kingkenshin1997
29 days ago by
kingkenshin19970 wrote:

Why a does a single gene name say MAPK3 have multiple ensembl ids and multiple fasta sequence? Isn't there supposed to be a single fasta sequence for each gene name?

sequence gene • 153 views
ADD COMMENTlink written 29 days ago by kingkenshin19970
3

Hi, please google isoforms and alternative splicing.

ADD REPLYlink written 29 days ago by ATpoint14k
2

Not necessarily. There can be more than one transcript variants.

ADD REPLYlink written 29 days ago by genomax63k

In fact, checking the latest GENCODE release for human, there are 58381 annotated genes. Of these, 36076 genes have more than one annotated transcript. Summary statistics (transcripts per gene):

Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   1.000   3.491   4.000 192.000

and quantiles:

10% 20% 30% 40% 50% 60% 70% 80% 90% 95% 99% 
  1   1   1   1   1   1   3   5  10  14  24

Note that this of couse contains many single-exon genes like smallRNA species and the picture probably changes for classical protein-coding genes.

ADD REPLYlink modified 29 days ago • written 29 days ago by ATpoint14k
1

I think this would be even higher if you limited it the ~20,000 protein coding genes. Very few protein coding genes have only one transcript.

ADD REPLYlink written 29 days ago by i.sudbery4.0k
1

...in most eukaryotes...

ADD REPLYlink written 28 days ago by Friederike3.2k

Quite right... Sorry, mammal focused again!

This is possibly not even true for most eukaryotes. Just most mammals. I don't think (for example) most Arabidopsis genes have multiple transcripts annotated. Last time a checked it might not even have been true for flys (although that was a while ago).

ADD REPLYlink written 28 days ago by i.sudbery4.0k

True: Only protein-coding:

 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   5.000   7.377  10.000 192.000 

10% 20% 30% 40% 50% 60% 70% 80% 90% 95% 99% 
  1   2   3   4   5   7   9  11  16  20  31
ADD REPLYlink written 29 days ago by ATpoint14k
3
gravatar for i.sudbery
29 days ago by
i.sudbery4.0k
Sheffield, UK
i.sudbery4.0k wrote:

There are two sorts of ENSEMBL ID. The first is the gene id. The gene MAPK3 maps to a single ENSEMBL gene id in human - ENSG00000102882. The other sort of ID is the ENSEMBL transcript id. As MAPK3 has several transcripts, there are several ENSEMBL transcript ids.

Note that there are cases where a signle gene symbol has more than one ENSEMBL gene id. This is because HUGO (which decides gene symbols) and ENSEMBL (which assigned ENSEMBL ids) don't necessarily agree on what is what gene. So for example, the gene IGF2 has two ids: ENSG00000129965 and ENSG00000167244. This is because there is a read-through transcript that incorporates parts of both the classic IGF2 ORF and the adjacent INS ORF. Ensembl has decided this represents two different genes (IGF2 and INS-IGF2) where as HUGO only allocates a single SYMBOL (IGF2)

ADD COMMENTlink modified 29 days ago • written 29 days ago by i.sudbery4.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2502 users visited in the last hour