DiffBind spike-in normalisation with varying amounts of spike-in chromatin
1
0
Entering edit mode
8 weeks ago
Drew • 0

Hi there!

I am currently using DiffBind v3.2.7 to analyse some ChIP-seq data for RNA Polymerase III (RNAPIII). We have Drosophila spike-in chromatin in the samples that I would like to use in DiffBind to normalise the data. The problem is that during library prep, some of the samples accidentally got different proportions of the spike-in chromatin relative to sample chromatin.

My question is whether this can be accounted for in DiffBind. What I have tried is calculating factors for each sample which are the %spike-in for sample / min %spike-in of all samples. I then thought to multiply the values in dba$norm$DESeq2\$norm.facs by these new factors before dba.analyze(). I multiply here since I believe these norm.facs are used to divide counts during analysis (therefore, libraries with more spike-in get bigger norm.facs, which results in down scaling when divided during analysis). Please let me know if this makes sense and is ok to do in any way, or if there is anything else that can be done (like sampling the bam files to achieve similar read counts before DiffBind). Thanks!

sessionInfo( )
R version 4.1.1 (2021-08-10)
Platform: x86_64-suse-linux-gnu (64-bit)
Running under: SUSE Linux Enterprise Server 12 SP4

Matrix products: default
BLAS:   /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8
[4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] dplyr_1.0.7                 DiffBind_3.2.7              profileplyr_1.8.1
[4] SummarizedExperiment_1.22.0 Biobase_2.52.0              GenomicRanges_1.44.0
[7] GenomeInfoDb_1.28.4         IRanges_2.26.0              S4Vectors_0.30.0
[10] MatrixGenerics_1.4.3        matrixStats_0.60.1          BiocGenerics_0.38.0

0
Entering edit mode
22 hours ago
Rory Stark ★ 1.3k

Do you know how much spike-in chromatin each sample received?

The problem is that the ChIP itself is expected to result in different amounts of spike-in chromatin in each sample (indeed, these differences in efficiency each time the ChIP is run is what we are attempting to normalize). If the differences in ChIP efficiency are compounded by different initial amounts of spike-in chromatin, the spike-in can't be used to reliably normalize the ChIP.

If you have a core set of RNAPIII sites that are expected to be unchanged between all your samples, you could consider using those sites to normalize. Just collect the unchanging sites in a GRanges object an pass it in to dba.normalize() using the spikein= parameter.