I need TSS (transcription start site) of all the protein coding genes in human genome. I only want to focus on canonical transcripts and want one TSS per gene. Can someone please tell me which file from which source can provide this information?
I tried refGene.txt file from UCSC, and GENCODE GTF file for basic gene annotation but both provide multiple TSSs for single gene. I looked into refseq_select dataset (that consist of representative transcript of every gene) but I think it hasn't been quality checked and released.
Any suggestions would be really helpful.
Thank you. It really helped but I think it doesn't cover all protein coding genes. By analyzing its GTF file, I found it has 16,230 genes in total.