I have a file with the coordinates of possible enhancers in reprogamming mouse cells (fibroblast and iPS cells, the mm10 genome). The datasheet looks like this, where RE is the enhancer and TG/TF are their associated genes/transcription factors:
TF TG Score FDR REs Pou5f1 L1td1 1.09E+06 9.24E-06 chr4_98727908_98728497 Pou5f1 Cd109 580062 9.24E-06 chr9_78623220_78624062;chr9_78615332_78616035 Sox2 L1td1 428168 9.24E-06 chr4_98727908_98728497;chr4_98726641_98726938
I wanted to be able to confirm they are enhancers and categorize them as known enhancers or novel enhancers like in this figure- is there a way to do this?
I was thinking of comparing the regions with a known enhancer database such as enhancer atlas (but it looks like they use mm9) or H3K4me1 Encode data (although there isn't a track for ipsc). Is there a way to find/count the union set in R or do I need to use bedtools?
The question here for me is not such much how to technically do this but rather what is considered an enhancer.
After all, there is a variety of methods that are used to call "enhancers" which depending on context is either any regulatory element, or one with certain marks. ATAC-seq measures open chromatin, ChIP-seq can probe associated marks such as H3K4me1 and H3K27ac and CAGE-seq can identify active transcription of non-promoter elements. You will find literature calling enhancers with any of these methods but without any functional data towards whether the called region has indeed any regulatory activity.
As such, there is much uncertainty when pulling any databases. The question would be whether you want first check whether there are datasets or databases that use the exact assay you used to call enhancers and then compare with that. There is naturally limited overlaps between independent methods and a just because a CAGE-seq database might miss an enhancer you called does not necessarily mean it is unreported, maybe it is in all of the ChIP-based databases.
Is there a common identifier/piece of information between your "possible enhancer coordinates" datasheet and databases such as enhance atlas to make a mapping between the two?