I have been working with single-cell ATAC-seq data and wanted to identify the closest genes related to peaks found to be differentially accessed across cell types. I have been analyzing the data with Signac package which is a great package for pre-processing and data analysis of scATAC (https://github.com/timoast/signac). To associate peaks to genes there is the ClosestFeature function which identifies the closest gene to the peak using GenomicRanges functions (I think is FindOverlap) based on the genomic annotation to provide. I have provided both only protein-coding genes and also the full gene biotypes from EnsemblDb.HSapiens.v86 package which obviously gave me different associations. The first option results in peaks being associated to genes also quite far away (even >1Mbp), whereas using the full annotation I get a lot of peaks next to gene bodies of genes which however don't tell me much..My question is what is the best practice in this case, if there is one: should i map peaks to all gene biotypes or protein-coding genes are the most likely to be associated with this signal? Are there a specific combination of gene biotypes you would go for? Perhaps should i map these peaks also to known genomic regulatory regions? Where should i look for that?
Thanks a lot!