Question: Normalization of ATAC SEQ data for the proper deeptools heatmap
gravatar for kinalimeric
10 months ago by
kinalimeric30 wrote:

Hi all,

I have paired-end 4 ATAC-seq data (2 replicates for 2 samples). I have done aligning using Bowtie2. I did filter MT reads and duplicates using Picard, then performed peak calling on Bam file using MACS2. Also I did differential peak analysis using deeptools and filter them by FDR<0.05 and abs(2foldchange)>2.

After these, I generated density peak heatmaps using deeptools. However, on the figure top on the heatmaps height of peaks are not the same for 4 files although I normalized bam files while converting to bigwig using bamCoverage.

My questions are: Should the height of those peaks be the same or slight change is acceptable? If not how can I normalize the data? Should I normalize bam files then do the peak calling again if so which tool you suggest? or diffbind normalization okay? Also, I am really confused about the coverage file normalization and peak normalization. Lastly, as written in this post Normalization and differential analysis in ATAC-seq data how can I downsample each sample?

If you could explain these I will appreciate it.

peak signal heatmap

ADD COMMENTlink modified 10 months ago by ATpoint44k • written 10 months ago by kinalimeric30
gravatar for ATpoint
10 months ago by
ATpoint44k wrote:

In some cases normalizing only for sequencing depth might be enough. Often it is not due to differences in library composition and different signal-to-noise ratios. I prefer to scale my bigwig files (or whatever counts you want to normalize) with the normalization factors from edgeR. Code examples and some details in the linked thread: A: ATAC-seq sample normalization (quantil normalization)

ADD COMMENTlink modified 10 months ago • written 10 months ago by ATpoint44k
gravatar for Devon Ryan
10 months ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

I wouldn't expect the peak heights to be identical, some amount of biological variation is normal. The goal of the normalization should instead be to set the background level to roughly similar values between samples.

You do not need to normalize your BAM files, CSAW or DiffBind will take care of that step for you.

If you want to downsample you can either use samtools view -s if starting from BAM files or seqtk if starting from fastq files. This is generally not needed.

ADD COMMENTlink written 10 months ago by Devon Ryan98k

Thank you so much for your help!

ADD REPLYlink written 10 months ago by kinalimeric30

I disagree in part as it should be the majority of peaks that should be similar between samples not the background. Libraries can have quite different background noise levels due to some technical artifacts. In most cases, and this is the assumption that normalization strategies such as the TMM approach from edgeR or RLE from DESeq2 have, is that you have a large number of regions (peaks) that does not change between conditions. The normalization goal is to find a size factor that centers these regions to have somewhat a fold change of zero between samples. This is important in ATAC-seq but even more important in assays like ChIP-seq where technical variation due to antibody pulldown efficency can be strikingly different so background levels can vary a lot even though peaks are actually not changing much.

ADD REPLYlink modified 10 months ago • written 10 months ago by ATpoint44k

I think whether you can normalize so peaks are most similar or background is most similar will depend a bit on the experiment. I've worked with a lot of people perturbing things in ways that I expect a large change in peaks. Given that normalizing for similar backgrounds make the most sense. If, however, one expects more modest or targeted changes then I completely agree that normalizing over peaks is preferable.

ADD REPLYlink written 10 months ago by Devon Ryan98k

Agreed. In my experience it depends on the context. If you have differences in signal/noise ratio go for peak normalization. If you have very different composition go for background. If you are unlucky and have both effects combined, say a ChIP for H3K27ac in a very early cell and a terminally-differentiated one plus very different antibody efficiencies between the conditions, then try both methods and see which manages better to push the majority of regions towards a FC of zero. Maybe also inspect regions that you know do not change on a genome browser or by plotting counts manually. It is a trade-off.

ADD REPLYlink written 10 months ago by ATpoint44k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1485 users visited in the last hour