Question

Assignment of distal regions to genes

4

Entering edit mode

4.7 years ago

ATpoint 82k

We have ATAC-seq and RNA-seq data from two sample groups (matched n=3 for each assay and group), which we used for differential analysis with rather stringent criteria (FDR < 1%, null hypothesis towards fold changes > |2| using glmTreat in edgeR). The task now is to assign differential ATAC-seq regions to differentially-expressed genes (DEG).

The naive approach I was trying is to assign each diff. ATAC-seq peak to the next differentially-expressed gene given it was in the same topologically-associating domain (TAD) from a closely-related cell type. Everything basically done with a combination of BEDtools intersect and closest.

Doing so, 75% of diff. ATAC could be assigned to a DEG. Distance to next DEG (in kb) as follows (quantiles):

 10%       25%       50%       75%       90%       95%       99% 
   0.0000    0.8330   47.4660  162.1232  362.3022  546.6986 1038.2000

Would you put trust in this kind of naive assignment? How do you typically approach this task? There is a tool InTAD at BioC for enhancer/gene assignment but from the paper I understand that n=3 per group (so 6 total) are not really powerful for its correlation-based approach, therefore tried the above approach first.

I am aware that these kinds of assignments without additional data from C-technologies (HiC, 4/5C-seq etc.) have quite a high rate of false assignments, still this is what we have so far. I would especially interested in your experience with these kinds of assignments.

Suggestions appreciated.

ATAC-seq RNA-seq TAD • 1.2k views

ADD COMMENT • link updated 4.7 years ago by GouthamAtla 12k • written 4.7 years ago by ATpoint 82k

1

Entering edit mode

I would also consider to split the regulation of the gene expression not only by absolute values but check if the up-regulated get more ATAC calls (and vice versa for the downregulted genes). In principle the approach is good IMO as many regulatory regions are within the 50kb range of the promoter. However, of course you will miss (potential meaningful) interactions, but without your mentioned C-technologies it might be difficult to call those. You could also consider to use ChIP-seq to show an enrichment with for example transcription factor occupancy - this makes it quite likely that those in the vicinity of your genes will contribute to its transcriptional regulation.

ADD REPLY • link 4.7 years ago by sim.j.baum ▴ 140

score 1 · Answer 1 · 2019-07-31

1

Entering edit mode

4.7 years ago

GouthamAtla 12k

You have differential ATAC and mRNA. First I would check how often there is a change in ATAC at the promoters of differential genes ? If so, you can create controls and validate your assignments. For example,

If you assign a differential ATAC to a differential gene (mRNA level), you can take a gene that is distance matched from ATAC region (with in the TAD) and show that the promoters of assigned genes tend to show changes in ATAC than that of distance matched control genes (non-assigned genes).
You can also use eQTL data (if there is any) from relevant tissue, and show that the assigned genes tend to be a eGene for eQTL in ATAC region that that of distance matched control genes.
You can also use Hi-C (or HiChIP/PCHiC) from relevant tissues or other tissues (not ideal) and show enrichment in interactions to assigned genes than that of distance matched control genes

ADD COMMENT • link 4.7 years ago by GouthamAtla 12k

1

Entering edit mode

I think its kind of not correct to assign the differential peaks to differential genes directly. You can assign peaks to genes based on whatever the evidence (eQTLs, HiC/HiChIP/PCHiC etc. or by doing a correlation analysis across tissues from roadmap DHS dataset etc) and then show differential peaks tend be in assigned frequently to differential genes than other control genes in the same TAD.

ADD REPLY • link 4.7 years ago by GouthamAtla 12k

0

Entering edit mode

Thank you for the comments. I see the problem you mention, a priori assumptions that differential gene expression must be due to differential chromatin accessability of a distal region probably underestimates the biological complexity, leaving out things like smallRNA-based regulation, especially post-transcriptional processes. Will check if I can get my hands on some suitable larger datasets to perform correlation analysis. The ImmGen project has matched RNA-seq and ATAC-seq data across the entire murine hematopoietic system, that would probably have the sufficient power for correlation analysis, e.g. with the InTAD package, and would be closely related to the data which we have. Will try that and come back with some experiences I made.

ADD REPLY • link 4.7 years ago by ATpoint 82k

0

Entering edit mode

I do not know much about hematopoietic system but this paper of PCHIC has data from 17 blood cell types.

https://www.cell.com/cell/fulltext/S0092-8674(16)31322-8

ADD REPLY • link 4.7 years ago by GouthamAtla 12k

0

Entering edit mode

Will check, thanks.

ADD REPLY • link 4.7 years ago by ATpoint 82k