Lately I've been interested in predicting Enhancer sites in hg38 using some ChIP Seq data I have.
Most of the reviews seem to tell me that I should look for areas that are enriched with H3K4me1, H3K27Ac and Pol that have none or little H3K4me3. So I'm primarily focusing on these sites.
Originally I assumed I could simply overlap my narrowPeak files using bedtools and then view these regions under IGV, but I'm getting very odd results that don't seem to make much sense. (Areas that still have quite a bit of overlapping H3K4me3).
Upon reading more literature I see that distal enhancer sites are also defined as being -5k, +5k bp from the TSS and TSE. So I've also attempted to include this in my analysis.
So my question is, can this be done in bed tools or would I need to do something else? If so what are the correct steps to produce what I'm looking for? Right now my process is approximately download gencode TSS and TSE coordinates from UCSC and use slopBed to flank them 5k on either direction then running intersectBed on Pol, H3K1, and H3K27 and then running another intersectBed on Pol, H3K1, H3K27Ac, and H3K4me3 then intersectBed the resulting bed files with the -v option so only hits from A show up. But this must be too simple of a process to give me the complicated results I seek. Any ideas?
I went ahead and generated only intergenic regions using http://crazyhottommy.blogspot.com/2013/05/find-exons-introns-and-intergenic.html and modifying his methods. I still am not quite sure how to continue the process using Bedtools.
Here are some related questions and discussions about predicting enhancer candidates that could give you some other options:
enhancer annotation for mouse mm9
Promoter Or Enhancer Regions Bed Format
The best definition to Enhancer and Promoter?
Chromatin state segmentation - direction of effect of the identified enhancer
I'm having a little trouble understanding ChromHMM. I've read the manual and it's paper a few times. I went ahead and binarized the bed files (H3K4me1, H3K27Ac, Pol) to act as my model, it produced several bedfiles, how do I use these output files to identify enhancers?
Is there anyway to include these three marks but exclude one (H3K4me3 for instance) and build that as a model using ChromHMM? I didn't find anything on this.
Once you binarize the files you have to use them with ChomHMM's LearnModel program. All ChromHMM does is find classes of chromatin given a set of binarized input. It's up to you to decide how many classes to find and to interpret which classes might correspond to enhancers. I have some code at https://github.com/daler/enhancer-snakemake-demo that helps with this, including using ChromHMM's OverlapEnrichment with VISTA enhancers to help with interpretation. Example output is in the aptly-named `example_output` dir.
Including/excluding marks is done when you call LearnModel, and often you have to play with those as well to get an interpretable set of states.