Question: How to find out whether Chip-seq peaks falls within promoter or enhancer region of gene?
0
gravatar for mayurdoke
2.9 years ago by
mayurdoke0
mayurdoke0 wrote:

Hello,

I have analyzed the ChIP-seq data. I have used MACS2 for peak calling. In result of that, I have data with the peaks and their corresponding peak regions from MACS2 peak analysis tools. I would like to know further that how can I check whether these peaks fall into enhancer or promoter region of the gene. Could you please let me know any particular software and tools available for it. Thank you.

chip-seq • 3.5k views
ADD COMMENTlink modified 2.9 years ago by e.rempel890 • written 2.9 years ago by mayurdoke0
1

What kind of ChIP-Seq data it is ? and what tissue/cell line ? If you have the regulome definition ( ChromHMM ) from ENCODE or roadmap epigenome for tissue of your interest, you can check if your peaks are enriched in enhancer/promoter regions.

Otherwise, you can assume that the peaks overlapping the annotated TSS as promoter peaks and rest as enhancer peaks.

ADD REPLYlink written 2.9 years ago by geek_y11k

Thank you for your reply. It is ChIP-Seq for transcription factor data. Chip- seq data performed on Human Lung endothelial cells. The data looks like below attached picture. The column 1 contains - Chromosome number and column 2 and 3 contains genomic position number.

enter image description here

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by mayurdoke0
3
gravatar for geek_y
2.9 years ago by
geek_y11k
Barcelona
geek_y11k wrote:

As far as I know, several histone modifications and DHS is data available for Lungs in epigenome roadmap. You can check if they have provided with the chromHMM for lungs and intersect your peaks with different chromHMM segment to see if your peaks are enriched in enhancers and promoters.

Update: E096 is the sample ID of lung tissue. So you can see here that you have H3K4Me1, H3K27Ac, H3K4Me3 consistent narrow and broadpeaks. From this you can pretty much define active enhancers and promoters in lungs.

H3K27Ac + H3K4Me1 = Active enhancers
H3K27Ac + H3K4Me3 = Active promoters 
H3K4Me1 - H3K27Ac = Inactive/Poised enhancers.

Then you can do enrichment analysis on your peaks.

If you are looking for tools to intersect peaks, you can use bedtools

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by geek_y11k

Thank you very much for your reply !!! I really appreciate.

ADD REPLYlink written 2.9 years ago by mayurdoke0

I update the answer.

ADD REPLYlink written 2.9 years ago by geek_y11k

@geek_y: Hi! Thanks for your answer! It is really helpful!

So, after I do the intersection of my Peak file with the Enhancer file, how can I do enrichment analysis? I mean, the intersection will give me the genomic regions, which are enhancer, but how to associate them with genes?

Many thanks for your help!

ADD REPLYlink written 18 months ago by bioinfouser70

You can do enrichment for overlaps using LOLA

To know the target genes of enhancers, you need to check if there is Hi-C "like" data for lungs. Otherwise, its pretty hard. The best is to assume the nearest "expressed" gene in lungs is the target.

ADD REPLYlink written 18 months ago by geek_y11k
1
gravatar for Prakash
2.9 years ago by
Prakash1.9k
India
Prakash1.9k wrote:

use homer, you can define the peaks based on "distance to TSS" column as promoter or enhancer.

ADD COMMENTlink written 2.9 years ago by Prakash1.9k

If something is far promoter, its not necessarily an enhancer.

ADD REPLYlink written 18 months ago by geek_y11k

I agree with your point and thank you for clarification. To define peak as enhancer it need to be overlapped with enhancer mark or DHS site from the given cell type.

ADD REPLYlink written 18 months ago by Prakash1.9k

DHS is still open-chromatin region, so it can be a CTCF binding site or an inactive open-chromatin region. For an enhancer, it should have enhancer mark like H3K27Ac+H3K4Me1 or a bidirectional CAGE tags etc.

ADD REPLYlink written 18 months ago by geek_y11k
1
gravatar for e.rempel
2.9 years ago by
e.rempel890
Germany, Heidelberg, COS
e.rempel890 wrote:

Hi,

if you have the gene model of your organism (usually in .gtf or .gff format, eventually as a R-package from Bioconductor), you can use R to find overlaps between called peaks and specific features of your genome (genes, exons, promoters). As far as I know, these genome models don't contain coordinates of enhancers. In this case, you could use the methods geek_y provided and add enhancers to your model. Both the data of called peaks and the gene model can be read in R environment as GRanges-object. There are many methods you can apply on GRanges-objects, especially findOverlaps, which could be useful to you.

HTH

ADD COMMENTlink written 2.9 years ago by e.rempel890

OP already mentioned that its Human Lung endothelial cells

ADD REPLYlink written 2.9 years ago by geek_y11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 813 users visited in the last hour