Question

Choice of normalization method for spike-in data

0

Entering edit mode

2.7 years ago

eashby47 ▴ 10

Hello,

I am seeking some advice on how to pair a normalization method with spike-in yeast DNA in a time course CUT&RUN experiment with 5 timepoints and 3 replicates per time point. I am using MACS3 to perform peak calls and DiffBind to test for differential binding (using the DESeq2 workflow) between the 0 hr timepoint and subsequent timepoints.

I have tried (a) normalizing tag counts according to spike-in library size and (b) normalizing according to spike-in relative log expression (RLE). In Figure 32 on page 55 of the Diffbind vignette, these two normalization method produce nearly identical results.

However, in my case, I am getting different results. Namely, one time point always produces 0 differentially bound sites, and choice of normalization method determines at which time point the 0-result occurs:

When I normalize by spike-in library size, I obtain the following results table (Group, refers to the time point of comparison, Group2 refers to the 0 hr timepoint, and DB.DESeq2 shows number of differentially bound sites FDR<0.05 at that contrast):

enter image description here

But when I normalize by spike-in RLE, I obtain the following results table:

enter image description here

When I look at the normalized tag count profiles for the top 5 sites identified in each normalization setting, the tag count profiles display different patterns which likely account for the differences in the dba results.

enter image description here

Shown below are my spike-in library sizes per replicate (colored by timepoint). Note that the 0 hr and 24 hr groups have the most variability in the spike-in library size between replicates.

enter image description here

I have two questions that I would like feedback on:

How do we choose an appropriate normalization method to pair with spike-in data? I understand that library size vs. RLE normalization methods have different underlying assumptions, but I don't know how to assess whether these assumptions are reasonable in the spike-in data. For example, an assumption of RLE normalization is balanced differential expression; this doesn't make any sense to me in the context of a spike-in.
Why are my results so different between these two normalization methods? Is the variability in spike-in library sizes driving a wedge between these normalization methods somehow?

I would appreciate any advice! Thanks! -Ethan

spike-in DiffBind • 1.9k views

ADD COMMENT • link 2.7 years ago by eashby47 ▴ 10

score 1 · Answer 1 · 2021-07-28

<tl_dr>

I have always been sceptical towards the use of spike-in in any NGS experiment because what you do is to add a constant amount of DNA to the library. It does not tell you anything about the data quality which can be very different in libraries. And if so then normalizing to a constant spike-in makes little sense. Imagine library1 is very good and you have 30% of reads overlapping peaks. Imagine library2 is crap and you have 1% reads overlapping peaks. If you scale them to a constant spike-in then this does not respect the difference in signal-to-noise ratio. I do not see any reason to use spike-ins unless you expect a global alteration in chromatin profile, in which case it would be the only option to use spike-ins, but it would be questionable if a NGS assay would be the experiment of choice. Others may have different opinions here.

CUT&RUN/TAG is not fundamentally different (in my understaning) to other enrichment-based DNA-seq assays such as ChIP or ATAC-seq. Hence the same normalization methods do apply. I personally like to use the methods from DESeq2 (often called RLE) followed by making MA-plots to inspect normalization. The easiest would be to simply run the standard DESeq2 pipeline and then from the results plot baseMean (on log2 scale) versus fold change, e.g. using the plotMA function from DESeq2 or any custom implemntation. The bulk of points should be somewhat centered along y ~ 0, and the "arrowhead". If that is the case then for me this indicates a good normalization. In case this is not the case you might need to tweak normalization, e.g. only using a subset of peaks for the normalization. I often use only the top 5000 peaks with the highest baseMean (so average counts between both conditions) as these regions have high counts in both conditions and as such are likely (in my head) to not change and as such are most reliable for normalization. After all a common assumption (e.g. for most differential analysis frameworks) is that most regions do not change between conditions or (how Michael Love the DESeq2 developer phrases it) that "the median ratio captures the size relationship. This is what MA-plots visualize.

I do not have an impressive set of example data to illustrate what I said above about choice of regions for normalization at hand right now, but this here should basically reflect it (towards the principle). It's two ATAC-seq samples with a lot of differential regions normalized with either all regions or only the top-5000 and I think top-5000 does a slightly better job as it better centers this "arrowhead" on the farleft at y=0. As said, this is just to illustrate what I described, the difference is not extreme here but if you have lots of changes going on in your data then choice of regions may notable influence the outcome.

enter image description here

Does that make sense to you? Feel free to add some of these plots in case you need feedback. This here is just my personal go-to strategy, others may have different opinions. In any case this here is a data-driven way of doing it as these plots really allow to "look at" you data and decide whether normalization went well rather than doing some blackbox-thing and just blindly trusting any method you choose.