I am new to these concepts in biology and need some help understanding. My main concern is how to find the enhancer of my gene of interest, specifically the sequence of the enhancer. I am working on a project where I will be attempting to use CRISPR- mediated deletion of the promoter region and enhancer region in my gene of interest FOXN2, in MCF7 Tamoxifen resistant breast cancer cells. I already have the sequence for the promoter region from UCLA's GenomeBrowser, but I'm not sure if the enhancer region sequence can just be looked up on there or anywhere else. Do I need to do a protocol to find the enhancer region myself? I have seen many things about ChIP-Seq and that I may need to do that to get data to find my enhancer but I am just not sure. I plan to use the sequences of the enhancer and promoter region of my gene of interest to create gRNA for CRISPR. Thank you to anyone who can help.
This is actually not a straightforward question. One gene can be regulated by multiple enhancers and one enhancer can regulate multiple genes. Furthermore, enhancer activity is oftentimes cell-type specific.
The only way to really identify whether a given region is a bona fide enhancer for your gene of interest is through experimental validation (e.g. CRISPR perturbation).
To identify candidate enhancers, typically H3K27ac ChIP-Seq and/or ATAC-seq (chromatin accessibility) signal is used. To predict which gene(s) an enhancer is associated with, many studies simply employ a distance threshold (e.g. the gene(s) for which the enhancer is within 50 kb or 100 kb of their promoter or TSS) or simply conservatively select the closest gene. Such predictions perform OK, but they are far from ideal.
In any case, many enhancer databases have been developed based off predictions like these:
- HACER: http://bioinfo.vanderbilt.edu/AE/HACER/
- EnhancerAtlas: http://www.enhanceratlas.org/indexv2.php
- Vista Enhancer browser: https://enhancer.lbl.gov/
And superenhancer databases too:
- dbSUPER: http://asntech.org/dbsuper/index.php
However, again, these just give you candidate enhancer-gene associations that are based off prediction models that are OK but not great. Experimental validation remains necessary.
Better prediction models are still actively being developed; the most recent enhancer-gene association prediction model, which integrates chromatin accessibility, H3K27ac signal, and Hi-C contact data, was just published just last month: Fulco, Nasser, et al. Nature Genetics, 2019, https://www.ncbi.nlm.nih.gov/pubmed/31784727 - This paper (based mainly on work done at the Broad Institute) is worth checking out just to give you a better idea of enhancer-gene associations and the challenges of identifying them.