RiP RLE normalisation using spike-in peaks in DiffBind for ChIP-seq
1
0
Entering edit mode
16 months ago
spg • 0

Hi,

I have two conceptual/technical questions about spike-in normalisation of ChIP-seq data.

We did spike-in for PolII ChIP-seq from a mouse cell line with ActiveMotif kit, which uses anti-H2Av antibody with drosophila cells. The mouse and fly chromatin are mixed and a mix of anti-mouse PolII and anti-H2Av antibodies are used to IP. I aligned the reads to mouse and drosophila genome separately. And now I am trying to get normalization factors for each replicate using DiffBind (v. 3.8.3), which I would like to use for differential peak analysis as well as scaling bigwig files.

Qn1: From what I can gather, in DiffBind, by including the spikein bam filles in the Spikein column of sampleSheet, I have the option to use the spikein library size or the csaw background bins method with RLE or TMM. Is there a built-in option to use the reads in peaks (RiP) method from the spike-in datasets (i.e., read counts in H2Av peaks) for normalizing the mouse reads? If there is an option to supply the peaks in the spike-in dataset, I cannot find it.

To my naive thinking, this option better controls for IP efficiency (~signal/noise ratio) per replicate than library size or background bins (assuming the antibody and chromatin concentrations in the replicates are constant). Am I missing something conceptually?

Qn2: To get the normalisation factors using the RiP from the spike-in datasets, I made a dba object using the drosophila sequences ONLY (drosophila bam files + H2Av peaks) and tried the various normalisations (code below). Differential peaks should not be found between conditions here, since these are spike-in chromatin IPs. It looks like the RiP RLE normalisation does an eerily excellent job at normalising the counts, while the library size or background bins methods still give a lot significantly different peaks when they shouldn't (although the fold change is minimal).

I understand, with RiP RLE normalisation, I am more directly normalising the peak signals and plotting the peak signal, so the fact that it has the best effect in this analysis is not surprising in itself. But I am surprised by the near absence of variation between the replicates (meaning a lot of the variation comes from IP efficiency between samples?). Does this mean, RiP RLE normalisation using spike-in peaks is the correct method to normalise my mouse read counts?

I am pretty new to ChIP-seq/DiffBind analyses. Please let me know if anything is wrong or unclear. Thank you!

noNormlibSizecsaw_bg_bins_RLERiP_RLERiP_RLE_y_zoom

>spik.db <- dba(sampleSheet = "../input_data/sampleSheets/P2_dm3_spikein_sampleSheet_DiffBind.csv",
         minOverlap = 1)
>spik.db
4 Samples, 11449 sites in matrix:
ID Tissue Factor Condition  Treatment Replicate Intervals
1  1   CL    P2        WT Full-media         1     10715
2  2   CL    P2        WT Full-media         2      9763
3  3   CL    P2        KO Full-media         1     11166
4  4   CL    P2        KO Full-media         2     11480
>spik.counts <- dba.count(spik.db,minOverlap = 1)
>spik.model <- dba.contrast(spik.counts, minMembers=2, reorderMeta=list(Condition="KO"))
>spik.model <- dba.analyze(spik.model) #default lib size
>dba.plotMA(spik.model, bNormalized = F, sub="No norm")
>dba.plotMA(spik.model, bNormalized = T, sub="Norm: Lib size")
>rm(spik.model)

>spik.model <- dba.contrast(spik.counts, minMembers=2, reorderMeta=list(Condition="KO"))
>spik.model<-dba.normalize(spik.model,normalize = "RLE",background=T) # bkgd bins RLE normalisation
>spik.model <- dba.analyze(spik.model)
>dba.plotMA(spik.model, bNormalized = T, sub="Norm: bkgd bins RLE")
>rm(spik.model)

>spik.model <- dba.contrast(spik.counts, minMembers=2, reorderMeta=list(Condition="KO"))
>spik.model<-dba.normalize(spik.model,normalize = "RLE") # RiP RLE normalisation
>spik.model <- dba.analyze(spik.model)
>dba.plotMA(spik.model, bNormalized = T, sub="Norm: RiP RLE")

>db_report<-dba.report(spik.model,th=1,bCounts = T) #to examine normalized counts of RiP RLE
>plot(db_report$Fold~db_report$Conc,main="RLE (RiP) y-axis zoomed")
csaw ChIP-seq spike-in normalisation DiffBind • 1.0k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode
16 months ago
Rory Stark ★ 2.0k

Answered on Bioconductor Support Forum:

https://support.bioconductor.org/p/9148431/#9148434

ADD COMMENT

Login before adding your answer.

Traffic: 2659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6