Transcription Start Site
3
0
Entering edit mode
2.0 years ago

What are the best databases to check out the transcription start sites of specific genes in human genome?

TSS • 1.3k views
ADD COMMENT
2
Entering edit mode

You can find TSS for all transcripts of a given gene by querying Biomart

ADD REPLY
0
Entering edit mode

Seems that DBTSS doesn't work!

ADD REPLY
0
Entering edit mode

you can use bioconductor as shown in this post using Genomicanges https://support.bioconductor.org/p/46508/

ADD REPLY
1
Entering edit mode
2.0 years ago
 wget -q  -O - "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/wgEncodeGencodeBasicV19.txt.gz" | gunzip -c  | awk '(int($7)< int($8)) {if($4=="+") {printf("%s\t%d\t%d\t%s\t%s\n",$3,$7,int($7)+1,$2,$4);}else {printf("%s\t%d\t%d\t%s\t%s\n",$3,int($8)-3,$8,$2,$4);}}' 


chr1    69090   69091   ENST00000335137.3   +
chr1    139306  139309  ENST00000423372.3   -
chr1    367658  367659  ENST00000426406.1   +
chr1    622031  622034  ENST00000332831.2   -
chr1    739134  739137  ENST00000599533.1   -
chr1    818042  818043  ENST00000594233.1   +
chr1    861321  861322  ENST00000342066.3   +
chr1    866442  866445  ENST00000598827.1   -
chr1    894617  894620  ENST00000327044.6   -
chr1    896073  896074  ENST00000338591.3   +
ADD COMMENT
0
Entering edit mode

Out of curiosity, why did you add 1 on the + strand and subtract 3 on the - strand?

chr1 69090 69091 ENST00000335137.3 + \ chr1 139306 139309 ENST00000423372.3 -

The resulting coordinates have lengths of 1 for the + strand and 3 for the - strand

ADD REPLY
0
Entering edit mode
2.0 years ago
ATpoint 82k

Basically any GTF file, from RefSeq, Ensembl, GENCODE. It is the start coordinate of the entries with type transcript. Be aware that for genes on the bottom strand it would be the end coordinate, but most GTFs even have a TSS entry that you can use directly.

ADD COMMENT
0
Entering edit mode
5 weeks ago
el24 ▴ 40

Here is a simple pythonic way to use biomart:

import pybiomart as pbm
dataset = pbm.Dataset(name='hsapiens_gene_ensembl',  host="http://sep2019.archive.ensembl.org/")
annot = dataset.query(attributes=['chromosome_name', 'transcription_start_site', 'strand', 'external_gene_name', 'transcript_biotype'])

Below is how annot results look like:

Chromosome/scaffold nameTranscription start site (TSS)    Strand  Gene name   Transcript type
MT    577 1   MT-TF   Mt_tRNA

MT    648 1   MT-RNR1 Mt_rRNA

MT    1602    1   MT-TV   Mt_tRNA  

MT    1671    1   MT-RNR2 Mt_rRNA

MT    3230    1   MT-TL1  Mt_tRNA

...   ... ... ... ... ...

chr1  228416627   -1  TRIM17  protein_coding

chr1  228416652   -1  TRIM17  protein_coding

...   ... ... ... ... ...
ADD COMMENT

Login before adding your answer.

Traffic: 1979 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6