Question: Whole genome coordinates of promoters/gene regulatory elements
gravatar for biocyberman
4.1 years ago by
biocyberman810 wrote:

I am trying to find some database that let me download and compile a table like this:

Organism Chromosome Gene GeneStart GeneEnd PromerStart PromoterEnd

Which database I can start with to download and extract those information? I want to do it for Human, Mouse and Rat with the latest genome version possible. I am comfortable to do any text manipulation provided the information is the for me to download.

Update 1

Purpose of making this table:

I want to make this table to help me choose the upstream regions of genes that may affect gene expression by their methylation status. They can be TSS, polymerase binding site, transcription factor binding site, etc. Anything that involve in regulation of gene expression.


mouse human rat gene genome • 1.6k views
ADD COMMENTlink modified 3.5 years ago by Pawel Osipowski20 • written 4.1 years ago by biocyberman810

What is your definition of promoter?

ADD REPLYlink modified 6 weeks ago by RamRS25k • written 4.1 years ago by Sean Davis25k

Please see update 1

ADD REPLYlink written 4.1 years ago by biocyberman810

If you can define your promoter, then you could use the TSS from DBTSS to compile your table.

ADD REPLYlink written 4.1 years ago by Sandeep250
gravatar for Giovanni M Dall'Olio
4.1 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

You can access all gene/promoter coordinates from R, using the TxDb objects:

> source("")
> biocLite("Homo.sapiens")
> library(Homo.sapiens)
> promoters(genes(TxDb.Hsapiens.UCSC.hg19.knownGene))
GRanges object with 23056 ranges and 1 metadata column:
        seqnames                 ranges strand   |     gene_id
           <Rle>              <IRanges>  <Rle>   | <character>
      1    chr19 [ 58874015,  58876214]      -   |           1
     10     chr8 [ 18246755,  18248954]      +   |          10
    100    chr20 [ 43280177,  43282376]      -   |         100
   1000    chr18 [ 25757246,  25759445]      -   |        1000
  10000     chr1 [244006687, 244008886]      -   |       10000
    ...      ...                    ...    ... ...         ...
   9991     chr9 [115095745, 115097944]      -   |        9991
   9992    chr21 [ 35734323,  35736522]      +   |        9992
   9993    chr22 [ 19109768,  19111967]      -   |        9993
   9994     chr6 [ 90537619,  90539818]      +   |        9994
   9997    chr22 [ 50964706,  50966905]      -   |        9997

You can get a similar info for mouse, rat, just by installing the corresponding packages. Use transcripts() instead of genes() to include multiple transcripts.

Note that with this method the promoter is defined simply as a the region 2200 bp upstream of each gene, without any specific validation of whether this is the promoter of the gene. It should be fine depending on the purpose.

ADD COMMENTlink modified 6 weeks ago by RamRS25k • written 4.1 years ago by Giovanni M Dall'Olio26k

I would prefer experimentally validated coordinates. But if there is no such thing, I will use this.

ADD REPLYlink written 4.1 years ago by biocyberman810
gravatar for Pawel Osipowski
3.5 years ago by
Poland, Warsaw
Pawel Osipowski20 wrote:

For other species, as this is the only post considered promoters, a useful tool might be 'bedtools flank'. From bed/gff/gtf you can produce file with intervals flanking start codon. I did this to generate gtf with promoter sequence intervals reaching 500nt downstream and 1000 upstream from a start_codon. This does the job:

bedtools flank -s -i <StartCodonInfile> -g <contigName_contigLength_table> -l 1000 -r 500 > <promotersOutfile>
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Pawel Osipowski20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 723 users visited in the last hour