Question

Difference Between Gene And Transcript In The Datasets

1

Entering edit mode

12.6 years ago

Ss ▴ 50

I have been working with different datasets for a month now but I get easily confused.

I need a dataset for genes in humans and I tried downloading it via ftp of different databases.

Ok, I understand there could be different exon, and splicing can result in different transcripts. But then it does have same start (bp) and end(bp). Shouldn't it be different among all the transcripts or I am missing something there?

I downloaded a dataset and a subset looks like this

Ensembl Gene ID    Ensembl Transcript ID    Chromosome Name    Gene Start (bp)    Gene End (bp)    Strand    Associated Gene Name    Associated Transcript Name    Status (gene)
ENSG00000016402    ENST00000316649    6    137321108    137366298    -1    IL20RA    IL20RA-001    KNOWN
ENSG00000016402    ENST00000468393    6    137321108    137366298    -1    IL20RA    IL20RA-002    KNOWN
ENSG00000016402    ENST00000367746    6    137321108    137366298    -1    IL20RA    IL20RA-003    KNOWN
ENSG00000016402    ENST00000461799    6    137321108    137366298    -1    IL20RA    IL20RA-005    KNOWN
ENSG00000016402    ENST00000460306    6    137321108    137366298    -1    IL20RA    IL20RA-004    KNOWN
ENSG00000016402    ENST00000541547    6    137321108    137366298    -1    IL20RA    IL20RA-202    KNOWN

In this example, chr_start and chr_end are same for all transcripts.

database gene transcript ensembl biomart • 10k views

ADD COMMENT • link updated 11.0 years ago by Raghav ▴ 100 • written 12.6 years ago by Ss ▴ 50

score 7 · Answer 1 · 2011-09-06

As it is written in the header "Gene Start (bp) Gene End (bp)" , your columns display the start and end position of the GENE (not the TRANSCRIPTs). You can clearly see here that all those transcripts (having different transcription start/end) belong to the same GENE (ENSG00000016402) on chromosome 6: chr6:137321108-137366298

score 1 · Answer 2 · 2013-04-03

As you mentioned, Difference between gene and transcript in the database, if we talking about gene in data base it means it have TSS position with 5'UTRs start codon............termination codon with 3'urts which have comparatively larger than our transcript which basically contains only CDS information that gene.