I have seen many many papers mentioning "GENCODE TSS"
However, upon looking at the GENCODE GTF file downloaded from the GENCODE website (e.g. gencode.vXX.annotation.gtf.gz), I didn't see any obvious "TSS" entry.
So, how does one goes about defining "GENCODE TSS? What does this statement EVEN MEAN??
My theory: So, within the GENCODE GTF file, I noticed that each (protein-coding) gene has multiple "transcript", Am I right in saying that the start/end coordinate (for + and - strand respectively) of each transcript of a gene would be the TSSs of that gene?
So for example, gene A (+ strand) have 3 transcripts, then Am I right in saying that the START coordinate of each of this transcript represent the 3 TSSs of gene A?
HOWEVER, How do you differentiate the case where for an alternate transcript of gene A, the first exon is NOT the first transcribed exon (due to splicing).
In this case wouldn’t it be wrong to define the start site of that exon as the TSS? (The real TSS should be attached to the spliced out exon instead).
What do you guys think of this?