Question: Ensembl regulatory build promoter region and gene name
gravatar for bharata1803
3.8 years ago by
bharata1803460 wrote:


I have a question regarding ensembl regulatory build from this link. From that link, the regulation data is in the GTF/GFF format. Below is the example of the data I downloaded:

chromosome project Name feature start end score strand frame attr 10 Regulatory_Build promoter 73594 74193 . . . "Name=Promoter;ID=ENSR00000349338;activity=inactive;bound_start=73594;bound_end=74793;Note=Consists of following features: H3K4me3,H3K4me3,H3K4me2,DNase1,CTCF(MA0139.1),CTCF(MA0139.1),CTCF(MA0139.1),Rad21" 10 Regulatory_Build promoter 76194 76793 . . . "Name=Promoter;ID=ENSR00000349339;activity=inactive;bound_start=75994;bound_end=76993;Note=Consists of following features: CTCF(MA0139.1)"

My question is, how can I extract which gene has that promoter region? I manually check this location using IGV but I don't know how to check this. The result seem weird because there is gene located in that promoter region. Anyone have some suggestion? Thank you.

promoter fasta • 1.6k views
ADD COMMENTlink modified 3.8 years ago by Devon Ryan96k • written 3.8 years ago by bharata1803460
gravatar for Devon Ryan
3.8 years ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

bedtools intersect or findOverlaps() (via GenomicRanges in R) or any of the other "interval overlap" tools.

ADD COMMENTlink written 3.8 years ago by Devon Ryan96k

Thank you for your reply. I am a bit confused with the location of the promoter. After I read some document, the promoter should be upstream of the gene location and not in the gene itself. So, what I imagine, if a gene transcription start is 100,000 and transcription end is 101,000, the promoter should be before 100,000 (if we use 2000bp as range the promoter is between 98,000-99,999). The gene GTF annotation is not overlaping with this interval. What is your opinion?

ADD REPLYlink written 3.8 years ago by bharata1803460

In R, one would load an appropriate txdb object and use the promoters() command to get the promoter intervals, thereafter using findOverlaps. For bedtools, one would first use biomart to get the promoter intervals and then use bedtools intersect. One can think of a large number of other ways to do this.

ADD REPLYlink written 3.8 years ago by Devon Ryan96k

Thank you. So, I use a different way and I want to ask your opinion. First, I download the data of the transcription start site and end site from Biomart. After that, I calculate the "hypothetical" promoter region by calculating 5kb upstream of start site and downstream of end site. After that I use the bedtools to intersect and I have the gene and transcript name for regulatory region from the Ensembl. What do you think about that? Another question is, I found several regulatory region from Ensembl positioned in the gene (overlap with intron and/or exon). Do you think this has some kind of biological interpretation as inhibitor of transcription process or it just an artifacts? The Ensembl regulatory data comes from Chip-seq I think. Thank you for your comment.

ADD REPLYlink written 3.8 years ago by bharata1803460

Your method should work fine as well, there are many ways to go about this, all giving the same results :)

Regarding the random binding events in genes, some of these might be functional, others not. There's a lot of random binding that leads to nothing (biology is noisy after all).

ADD REPLYlink written 3.8 years ago by Devon Ryan96k

Thank you very much for your suggestion.

ADD REPLYlink written 3.8 years ago by bharata1803460
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1143 users visited in the last hour