Question: TSS / TTS in Ensembl gene annotation?
gravatar for thefirstrealace
4.6 years ago by
thefirstrealace30 wrote:

Hello, i have one question about a gene annotation i downloaded recently in gff3 format. Below is an abreviated example containing the first few lines of this file:

##gff-version 3
# Generated on Tue Nov 27 19:25:49 2012
# UCSC table file ./ucsc_tables/hg19/ensGene.txt
chr1    ensGene    gene       11869    14412    .    +    .    Name=...
chr1    ensGene    ncRNA    11869    14409    .    +    .    Name=...
chr1    ensGene    exon       11869    12227    .    +    .    Name=...
chr1    ensGene    exon       12613    12721    .    +    .    Name=...



chr1    ensGene    gene       14363    29806    .    -    .    Name=...
chr1    ensGene    ncRNA    14363    29370    .    -    .    Name=...
chr1    ensGene    exon       14363    14829    .    -    .    Name=...



As shown above, for each gene, there is an arbitrary number of exons listed for it.

My question: Is it correct to assume, that the start and end coordinates of a listed gene represent the TSS and TTS?

I need these two properties to measure the distance to certain alternative splice events, which i have computed with MISO (unfortunately, the MISO output doesn't provide these two properties)


Best regards

gene annotation gff ensembl • 6.3k views
ADD COMMENTlink modified 4.6 years ago by Emily_Ensembl21k • written 4.6 years ago by thefirstrealace30

My old question on this subject may help you, with adjustments for your genome of interest. I include some scripting to grab TSS coordinates from Ensembl GTF or via their Perl API. You will need to consider the strand the annotation is assigned to, to use that annotation coordinates to generate a useful TSS value.

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Alex Reynolds30k
gravatar for Emily_Ensembl
4.6 years ago by
Emily_Ensembl21k wrote:

The start coordinate of forward strand genes and the end coordinate of negative strand genes will represent the TSS of the most 5' transcript of the gene. Other transcripts of the gene will have different TSSs. To get all TSSs, you should use the cDNA features in the file.

ADD COMMENTlink written 4.6 years ago by Emily_Ensembl21k

Thank you very much for your help, for some reason I never considered the cDNA features in this file, but it actually makes perfect sense :)

ADD REPLYlink written 4.6 years ago by thefirstrealace30

Hi Emily, I have annotation as "gene", "transcript" and "exon". Should I consider TSS based on transcript start or gene start?

ADD REPLYlink written 3.0 years ago by sgupt460

TSS is the start of the transcript.

ADD REPLYlink written 2.9 years ago by Emily_Ensembl21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1731 users visited in the last hour