Whole genome coordinates of promoters/gene regulatory elements
2
0
Entering edit mode
5.9 years ago
biocyberman ▴ 830

I am trying to find some database that let me download and compile a table like this:

 Organism Chromosome Gene GeneStart GeneEnd PromerStart PromoterEnd

Which database I can start with to download and extract those information? I want to do it for Human, Mouse and Rat with the latest genome version possible. I am comfortable to do any text manipulation provided the information is the for me to download.

Update 1

Purpose of making this table:

I want to make this table to help me choose the upstream regions of genes that may affect gene expression by their methylation status. They can be TSS, polymerase binding site, transcription factor binding site, etc. Anything that involve in regulation of gene expression.

genome gene rat human mouse • 2.1k views
2
Entering edit mode

What is your definition of promoter?

0
Entering edit mode

0
Entering edit mode

If you can define your promoter, then you could use the TSS from DBTSS to compile your table.

3
Entering edit mode
5.9 years ago

You can access all gene/promoter coordinates from R, using the TxDb objects:

> source("https://bioconductor.org/biocLite.R")
> biocLite("Homo.sapiens")
> library(Homo.sapiens)
> promoters(genes(TxDb.Hsapiens.UCSC.hg19.knownGene))
GRanges object with 23056 ranges and 1 metadata column:
seqnames                 ranges strand   |     gene_id
<Rle>              <IRanges>  <Rle>   | <character>
1    chr19 [ 58874015,  58876214]      -   |           1
10     chr8 [ 18246755,  18248954]      +   |          10
100    chr20 [ 43280177,  43282376]      -   |         100
1000    chr18 [ 25757246,  25759445]      -   |        1000
10000     chr1 [244006687, 244008886]      -   |       10000
...      ...                    ...    ... ...         ...
9991     chr9 [115095745, 115097944]      -   |        9991
9992    chr21 [ 35734323,  35736522]      +   |        9992
9993    chr22 [ 19109768,  19111967]      -   |        9993
9994     chr6 [ 90537619,  90539818]      +   |        9994
9997    chr22 [ 50964706,  50966905]      -   |        9997


You can get a similar info for mouse, rat, just by installing the corresponding packages. Use transcripts() instead of genes() to include multiple transcripts.

Note that with this method the promoter is defined simply as a the region 2200 bp upstream of each gene, without any specific validation of whether this is the promoter of the gene. It should be fine depending on the purpose.

0
Entering edit mode

I would prefer experimentally validated coordinates. But if there is no such thing, I will use this.

0
Entering edit mode
5.3 years ago

For other species, as this is the only post considered promoters, a useful tool might be 'bedtools flank'. From bed/gff/gtf you can produce file with intervals flanking start codon. I did this to generate gtf with promoter sequence intervals reaching 500nt downstream and 1000 upstream from a start_codon. This does the job:

bedtools flank -s -i <StartCodonInfile> -g <contigName_contigLength_table> -l 1000 -r 500 > <promotersOutfile>