What are the best databases to check out the transcription start sites of specific genes in human genome?
What are the best databases to check out the transcription start sites of specific genes in human genome?
 wget -q  -O - "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/wgEncodeGencodeBasicV19.txt.gz" | gunzip -c  | awk '(int($7)< int($8)) {if($4=="+") {printf("%s\t%d\t%d\t%s\t%s\n",$3,$7,int($7)+1,$2,$4);}else {printf("%s\t%d\t%d\t%s\t%s\n",$3,int($8)-3,$8,$2,$4);}}' 
chr1    69090   69091   ENST00000335137.3   +
chr1    139306  139309  ENST00000423372.3   -
chr1    367658  367659  ENST00000426406.1   +
chr1    622031  622034  ENST00000332831.2   -
chr1    739134  739137  ENST00000599533.1   -
chr1    818042  818043  ENST00000594233.1   +
chr1    861321  861322  ENST00000342066.3   +
chr1    866442  866445  ENST00000598827.1   -
chr1    894617  894620  ENST00000327044.6   -
chr1    896073  896074  ENST00000338591.3   +
Basically any GTF file, from RefSeq, Ensembl, GENCODE. It is the start coordinate of the entries with type transcript. Be aware that for genes on the bottom strand it would be the end coordinate, but most GTFs even have a TSS entry that you can use directly.
Here is a simple pythonic way to use biomart:
import pybiomart as pbm
dataset = pbm.Dataset(name='hsapiens_gene_ensembl',  host="http://sep2019.archive.ensembl.org/")
annot = dataset.query(attributes=['chromosome_name', 'transcription_start_site', 'strand', 'external_gene_name', 'transcript_biotype'])
Below is how annot results look like:
Chromosome/scaffold nameTranscription start site (TSS) Strand Gene name Transcript type MT 577 1 MT-TF Mt_tRNA MT 648 1 MT-RNR1 Mt_rRNA MT 1602 1 MT-TV Mt_tRNA MT 1671 1 MT-RNR2 Mt_rRNA MT 3230 1 MT-TL1 Mt_tRNA ... ... ... ... ... ... chr1 228416627 -1 TRIM17 protein_coding chr1 228416652 -1 TRIM17 protein_coding ... ... ... ... ... ...
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You can find TSS for all transcripts of a given gene by querying Biomart
Seems that DBTSS doesn't work!
you can use bioconductor as shown in this post using Genomicanges https://support.bioconductor.org/p/46508/