We have ATAC-seq and RNA-seq data from two sample groups (matched n=3 for each assay and group), which we used for differential analysis with rather stringent criteria (FDR < 1%, null hypothesis towards fold changes > |2| using
edgeR). The task now is to assign differential ATAC-seq regions to differentially-expressed genes (DEG).
The naive approach I was trying is to assign each diff. ATAC-seq peak to the next differentially-expressed gene given it was in the same topologically-associating domain (TAD) from a closely-related cell type. Everything basically done with a combination of BEDtools
Doing so, 75% of diff. ATAC could be assigned to a DEG. Distance to next DEG (in kb) as follows (quantiles):
10% 25% 50% 75% 90% 95% 99% 0.0000 0.8330 47.4660 162.1232 362.3022 546.6986 1038.2000
Would you put trust in this kind of naive assignment? How do you typically approach this task?
There is a tool
InTAD at BioC for enhancer/gene assignment but from the paper I understand that n=3 per group (so 6 total) are not really powerful for its correlation-based approach, therefore tried the above approach first.
I am aware that these kinds of assignments without additional data from C-technologies (HiC, 4/5C-seq etc.) have quite a high rate of false assignments, still this is what we have so far. I would especially interested in your experience with these kinds of assignments.