Question: Normalization of ATAC SEQ data for the proper deeptools heatmap
0
gravatar for kinalimeric
11 weeks ago by
kinalimeric20
kinalimeric20 wrote:

Hi all,

I have paired-end 4 ATAC-seq data (2 replicates for 2 samples). I have done aligning using Bowtie2. I did filter MT reads and duplicates using Picard, then performed peak calling on Bam file using MACS2. Also I did differential peak analysis using deeptools and filter them by FDR<0.05 and abs(2foldchange)>2.

After these, I generated density peak heatmaps using deeptools. However, on the figure top on the heatmaps height of peaks are not the same for 4 files although I normalized bam files while converting to bigwig using bamCoverage.

My questions are: Should the height of those peaks be the same or slight change is acceptable? If not how can I normalize the data? Should I normalize bam files then do the peak calling again if so which tool you suggest? or diffbind normalization okay? Also, I am really confused about the coverage file normalization and peak normalization. Lastly, as written in this post Normalization and differential analysis in ATAC-seq data how can I downsample each sample?

If you could explain these I will appreciate it.

peak signal heatmap

ADD COMMENTlink modified 11 weeks ago by ATpoint34k • written 11 weeks ago by kinalimeric20
6
gravatar for ATpoint
11 weeks ago by
ATpoint34k
Germany
ATpoint34k wrote:

In some cases normalizing only for sequencing depth might be enough. Often it is not due to differences in library composition and different signal-to-noise ratios. I prefer to scale my bigwig files (or whatever counts you want to normalize) with the normalization factors from edgeR. Code examples and some details in the linked thread: A: ATAC-seq sample normalization (quantil normalization)

ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by ATpoint34k
2
gravatar for Devon Ryan
11 weeks ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:

I wouldn't expect the peak heights to be identical, some amount of biological variation is normal. The goal of the normalization should instead be to set the background level to roughly similar values between samples.

You do not need to normalize your BAM files, CSAW or DiffBind will take care of that step for you.

If you want to downsample you can either use samtools view -s if starting from BAM files or seqtk if starting from fastq files. This is generally not needed.

ADD COMMENTlink written 11 weeks ago by Devon Ryan95k

Thank you so much for your help!

ADD REPLYlink written 11 weeks ago by kinalimeric20
1

I disagree in part as it should be the majority of peaks that should be similar between samples not the background. Libraries can have quite different background noise levels due to some technical artifacts. In most cases, and this is the assumption that normalization strategies such as the TMM approach from edgeR or RLE from DESeq2 have, is that you have a large number of regions (peaks) that does not change between conditions. The normalization goal is to find a size factor that centers these regions to have somewhat a fold change of zero between samples. This is important in ATAC-seq but even more important in assays like ChIP-seq where technical variation due to antibody pulldown efficency can be strikingly different so background levels can vary a lot even though peaks are actually not changing much.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by ATpoint34k
1

I think whether you can normalize so peaks are most similar or background is most similar will depend a bit on the experiment. I've worked with a lot of people perturbing things in ways that I expect a large change in peaks. Given that normalizing for similar backgrounds make the most sense. If, however, one expects more modest or targeted changes then I completely agree that normalizing over peaks is preferable.

ADD REPLYlink written 11 weeks ago by Devon Ryan95k
1

Agreed. In my experience it depends on the context. If you have differences in signal/noise ratio go for peak normalization. If you have very different composition go for background. If you are unlucky and have both effects combined, say a ChIP for H3K27ac in a very early cell and a terminally-differentiated one plus very different antibody efficiencies between the conditions, then try both methods and see which manages better to push the majority of regions towards a FC of zero. Maybe also inspect regions that you know do not change on a genome browser or by plotting counts manually. It is a trade-off.

ADD REPLYlink written 11 weeks ago by ATpoint34k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 850 users visited in the last hour