How to define open or close gene in ATAC-seq?
Entering edit mode
10 months ago
gglim ▴ 140

May I ask how do you use your ATAC-seq data for post-alignment analysis?

Here's what I'm doing and I have a few problems:

  • For example, there are two groups of samples, WT and KO. In order to define the peak status, I merge all bam files (all replicates of two groups) and call a TOTAL peak using MACS2.

  • Then in R, WT and KO peak file is used to define peaks status by ChIPpeakAnno::findOverlappingPeaks, respectively. if there's overlap between WT and TOTAL, this TOTAL peak is defined as "Open in WT".

  • ChIPpeakAnno::annoPeaks do the annotation part with parameters (bindingType = "startSite", bindingRegion = c(-3000,3000)).

Problem: A gene can be annotated by many peaks, which show different status, eg. "OpenToClose", "CloseToOpen","BothOpen","NeitherOpen".

Does it make sense if I use these genes for further analysis? such as GO enrichment, their expression levels in RNAseq. If not, how to do it in a better way?

ChIPpeakAnno ATAC-seq MACS2 • 553 views
Entering edit mode
10 months ago
ATpoint 65k

In my opinion the only truely relevant region to define a gene as "open" or "closed" is the proximal promoter region which typically is approximated as the 500bp upstream of the annotated transcription start site(s). For a confident statement of peak status between conditions you would need to perform a proper differential analysis. For this you could look into the DiffBind or csaw packages, or alternatively merge all peaks, make a count matrix (samples are columns, regions are rows) and then plug this into differential analysis frameworks such as DESeq2. Just checking whether a peak is present in one condition but not in the other is a naive approach that will miss many peaks that are detectable in both conditions, yet with very different read counts, so you only probably get the very top differential peaks with such as approach.

Entering edit mode

Hi, ATpoint.

Thanks a lot for your reply! I have some questions about your advices. Do you mean I should change the bindingRegion parameter into c(-500,0) to annotate the peaks more precisely?

…and for the DA part, actually I did use the computeMatrix scale-regions to build a 'expression matrix', with the parameters -a 0 -b 0 -bs 1 --regionBodyLength 1. However, the result of following analysis using DESeq2 shows few significant differential peaks because of the high p-values/p-adjust. That’s why I turn to use the 'OpenClose' criteria.


Login before adding your answer.

Traffic: 2119 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6