Question: ATAC-seq: differential peaks using macs2 bdgdiff yields strange results
gravatar for stephenz
6 weeks ago by
stephenz0 wrote:

Hi all, I'm relatively new to ATAC-seq and bioinformatics in general. I am processing an ATAC-seq dataset (have treatment and control), in particular looking to find differential peaks and then seek for TF binding sites by motif finding and JASPAR.

  • Reads were trimmed, aligned to reference genome using bowtie2, and filtered
  • Peaks called using macs2. For instance:

macs2 callpeak --nomodel -t Sample_WT_dedup.bam -n Sample_WT --keep-dup all --gsize 11250000000 --shift -37 --extsize 73 --outdir macs2_output -B

  • The above was done for both the WT (control) and treatment data. Then I tried to use bdgdiff to call differential peaks, following the guidelines here

    macs2 bdgdiff --t1 Sample_LMN_output/macs2_output/Sample_LMN_treat_pileup.bdg
    --t2 Sample_WT_output/macs2_output/Sample_WT_treat_pileup.bdg 
    --c1 Sample_LMN_output/macs2_output/Sample_LMN_control_lambda.bdg 
    --c2 Sample_WT_output/macs2_output/Sample_WT_control_lambda.bdg 
    --d1 15430860 --d2 23095133 -g 60 -l 120 --outdir WT_LMN_diff --o-prefix WT_LMN

Here, I have taken --d1 and --d2 to be respectively the number of tags encountered by macs2 in the respective samples (extracted from .xls output). From what I understand, MACS2 uses these inputs for read-count normalization so the fact that I have more reads in WT than treatment samples should be corrected for.

However, the output is otherwise: only 223 peaks are enriched in the treatment (relative to WT), whilst ~10000 peaks are enriched in the WT (relative to treatment).

Furthermore, some of the log-likelihood ratios are crazy... see last column

chr2    19825317        19825545        WT_LMN_cond1_89 -632933.75000
chr2    19826135        19826362        WT_LMN_cond1_90 -647127.06250
chr2    19827419        19827635        WT_LMN_cond1_91 -683938.56250
chr2    19832164        19832295        WT_LMN_cond1_92 -2365761.25000
chr2    19832486        19832752        WT_LMN_cond1_93 -736282.37500
chr2    22930658        22930814        WT_LMN_cond1_94 -1535228.75000

Seems like there is still the effect of the amount of reads in each sample. I've looked around for information regarding this, but I haven't really found anything yet. To my understanding MACS2 bdgdiff should be able to deal with normalization itself (since it is taking the --d1 and --d2 params).

Am I correct that MACS2 does its own read count normalization? Or does this have to be a manual step ahead of time?

Would be grateful for any help :)


atac-seq macs2 • 246 views
ADD COMMENTlink modified 14 days ago by firatuyulur190 • written 6 weeks ago by stephenz0

If the treatment isn't expected to affect open regions (but only transcription factor binding) then you'll be hard pressed to find differences with MACS2. The general strategy would be to use MACS2 to find open chromatin and then use footprinting (e.g., with wellington) to find TF binding sites within those open regions. My presumption is that the regions will be open regardless of the treatment, its the footprints that will differ (this may change if your TF affects the openness of a region).

ADD REPLYlink written 6 weeks ago by Devon Ryan77k

I recommend against using macs2 for differential peak analysis. The point with MACS is that it is designed for ChIP-seq, where typically the summit read count is the critical parameter to be used for the analysis. In ATAC-seq, the assay creates cutting events all over the open chromatin regions. In my understanding, the summit is not always very informative about changes in accessability, as these can happen on the edges of a region. Better call peaks on both your conditions, merge and create a consensus peak list. Then use featureCounts or a similar tool to count reads over these regions and feed the count matrix into DESeq2. This of course only makes sense if you have (enough) replicates. If you have no replicates, thresholding the log2-fold change at say 1.5x may give you a rough idea in which regions things are changing upon treatment.

ADD REPLYlink written 5 weeks ago by ATpoint3.2k
gravatar for firatuyulur
14 days ago by
firatuyulur190 wrote:

I was working with MACS2 a lot last year and had similar issues. I cannot give a straight answer to this question but while calling my peaks from bam files I remember using -B --SPMR which asks MACS2 to generate pileup signal file of 'fragment pileup per million reads' in bedGraph format. Then you can compare the normalised bdg files. Or there are tools made for differential binding calculation like DiffBind where the tool normalises internally. I hope this helps.

ADD COMMENTlink written 14 days ago by firatuyulur190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 917 users visited in the last hour