How to find out whether Chip-seq peaks falls within promoter or enhancer region of gene?
3
1
Entering edit mode
3.7 years ago
mayurdoke ▴ 10

Hello,

I have analyzed the ChIP-seq data. I have used MACS2 for peak calling. In result of that, I have data with the peaks and their corresponding peak regions from MACS2 peak analysis tools. I would like to know further that how can I check whether these peaks fall into enhancer or promoter region of the gene. Could you please let me know any particular software and tools available for it. Thank you.

ChIP-Seq • 4.2k views
ADD COMMENT
1
Entering edit mode

What kind of ChIP-Seq data it is ? and what tissue/cell line ? If you have the regulome definition ( ChromHMM ) from ENCODE or roadmap epigenome for tissue of your interest, you can check if your peaks are enriched in enhancer/promoter regions.

Otherwise, you can assume that the peaks overlapping the annotated TSS as promoter peaks and rest as enhancer peaks.

ADD REPLY
0
Entering edit mode

Thank you for your reply. It is ChIP-Seq for transcription factor data. Chip- seq data performed on Human Lung endothelial cells. The data looks like below attached picture. The column 1 contains - Chromosome number and column 2 and 3 contains genomic position number.

enter image description here

ADD REPLY
3
Entering edit mode
3.7 years ago

As far as I know, several histone modifications and DHS is data available for Lungs in epigenome roadmap. You can check if they have provided with the chromHMM for lungs and intersect your peaks with different chromHMM segment to see if your peaks are enriched in enhancers and promoters.

Update: E096 is the sample ID of lung tissue. So you can see here that you have H3K4Me1, H3K27Ac, H3K4Me3 consistent narrow and broadpeaks. From this you can pretty much define active enhancers and promoters in lungs.

H3K27Ac + H3K4Me1 = Active enhancers
H3K27Ac + H3K4Me3 = Active promoters 
H3K4Me1 - H3K27Ac = Inactive/Poised enhancers.

Then you can do enrichment analysis on your peaks.

If you are looking for tools to intersect peaks, you can use bedtools

ADD COMMENT
0
Entering edit mode

Thank you very much for your reply !!! I really appreciate.

ADD REPLY
0
Entering edit mode

I update the answer.

ADD REPLY
0
Entering edit mode

@geek_y: Hi! Thanks for your answer! It is really helpful!

So, after I do the intersection of my Peak file with the Enhancer file, how can I do enrichment analysis? I mean, the intersection will give me the genomic regions, which are enhancer, but how to associate them with genes?

Many thanks for your help!

ADD REPLY
0
Entering edit mode

You can do enrichment for overlaps using LOLA

To know the target genes of enhancers, you need to check if there is Hi-C "like" data for lungs. Otherwise, its pretty hard. The best is to assume the nearest "expressed" gene in lungs is the target.

ADD REPLY
1
Entering edit mode
3.7 years ago
Prakash ★ 2.1k

use homer, you can define the peaks based on "distance to TSS" column as promoter or enhancer.

ADD COMMENT
0
Entering edit mode

If something is far promoter, its not necessarily an enhancer.

ADD REPLY
0
Entering edit mode

I agree with your point and thank you for clarification. To define peak as enhancer it need to be overlapped with enhancer mark or DHS site from the given cell type.

ADD REPLY
0
Entering edit mode

DHS is still open-chromatin region, so it can be a CTCF binding site or an inactive open-chromatin region. For an enhancer, it should have enhancer mark like H3K27Ac+H3K4Me1 or a bidirectional CAGE tags etc.

ADD REPLY
1
Entering edit mode
3.7 years ago
e.rempel ★ 1.0k

Hi,

if you have the gene model of your organism (usually in .gtf or .gff format, eventually as a R-package from Bioconductor), you can use R to find overlaps between called peaks and specific features of your genome (genes, exons, promoters). As far as I know, these genome models don't contain coordinates of enhancers. In this case, you could use the methods geek_y provided and add enhancers to your model. Both the data of called peaks and the gene model can be read in R environment as GRanges-object. There are many methods you can apply on GRanges-objects, especially findOverlaps, which could be useful to you.

HTH

ADD COMMENT
0
Entering edit mode

OP already mentioned that its Human Lung endothelial cells

ADD REPLY

Login before adding your answer.

Traffic: 1608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6