Question

What is the difference between transcript id and Ensembl gene id

0

Entering edit mode

7.8 years ago

feng0049 • 0

Dear Biostars community, I am very new to genetic analysis. I have just finished extract human RNA gene read counts from fastq files to raw count files in order to conduct differential analysis in edgeR package. This link describes how I did that.

However, after I obtained the read counts file, I notice that I got a sequence of lines in the count file like this:

...
uc001adk.4  10
uc001adl.3  0
uc001adm.6  0
uc001ado.4  0
uc001adp.4  0
...

@Pierre Lindenbaum point out to me that ucxxxxxx.x is an transcript id (thank you very much :) ). Also, I have noticed that there are others like:

ENSG00000162367

which are known as ensembl gene id.

May I know what are the differences or connections between these two?

And What is the connection between these different gene id and gene symbols?

Thank you all in advance!

RNA-Seq genome gene • 17k views

ADD COMMENT • link updated 7.8 years ago by EagleEye 7.5k • written 7.8 years ago by feng0049 • 0

0

Entering edit mode

uc001adk.4 is an UCSC gene id

in fact , this is a transcript id

ADD REPLY • link 7.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks, I will edit this. So, the transcript id can then be converted to gene id?

ADD REPLY • link 7.8 years ago by feng0049 • 0

score 10 · Accepted Answer · 2016-06-28

The difference between Ensembl gene and transcript ID is,

1 ) Ensembl ID starts with ENSGxxxx represents a genomic regions (Gene/Gene ID)

2) Ensembl ID starts with ENSTxxxx represents a transcript ID

3) ENSTxxxx is genomic variant or splice variant (Isoform) of corresponding gene with ENSGxxxx ID

4) One gene (ENSGxxxx / GeneSymbol) can have multiple corresponding transcript ID (ENSTxxxx)

Example:

Gene ID: ENSG00000236172 has 67 variants (which means one ENSGxxxx ID or GeneSymbol will have 67 different ENSTxxxx names)

http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000236172;r=2:6615389-6650535

Click: show transcript table button

Annotations:

There are different annotations available from different source like,

a) UCSC

b) RefSeq

c) Ensembl

d) Gencode

etc., each has its own way of naming the gene or transcript locations. You should be consistent in using single annotation throughout your analysis to avoid confusion.