Question

Difference of DiffBind library size normalisation vs DESeq2 library size normalisation

0

Entering edit mode

4.0 years ago

m.sadman.sakib ▴ 120

For ChIPseq analysis, I was using DiffBind until now but want to switch to deseq2 as I want to control for multiple covariates, which is not currently offered by DiffBind package AFAIK(only one blocking factor at a time, even though I concatenate the blocking factors). I have the raw counts generated from the bams with reference to MACS2 peaks, and I can do the full analysis, however, I have a question regarding library size normalisation which is below:

DiffBind by default, does it. DESeq2 vignette also suggests, it does library size normalisation by default. But the difference that I find is, Diffbind takes the library size information from the BAM files and uses that, which is probably total mapped reads in the BAM files. In terms of DESeq2, since it doesn't have the bams, it probably do the colum wise sample read count sums to get the library size. Now these total read sums would be the read counts of only those portions detected by MACS2, but not the whole bam file, right? Fundamentally, will they be different or not? I can imagine, there might be reads in the bam files that are not detected by macs2, so I will not have the counts generated by, say, featureCounts. I would really appreciate if the community can comment on this!

Also, when Diffbind does this default normalisation (bFullLibrarysize = T) by default, then invokes the DESeq2 to do differential analysis, deseq2 there also does its own normalisation. Then when someone is using DiffBind package, does the count matrix gets two times normalised by the library size? Once from Bam read counts (DiffBind), again from total counts(DESeq2)?

My main point is, can I trust the DESeq2 library size normalisation method as opposed to Diffbind way of library size normalisation? And use DESeq2 only for analysing my data instead of Diffbind?

One probable solution could be, in DESeq2, feeding the total mapped read numbers as an extra column and keep it as continuous variable, and incorporate that column in design matrix. Does it sound logical? Has anyone done this like that?

Thank you again for taking your time to read my post! Stay safe!

DESeq2 Diffbind ChIP-Seq RNA-Seq normalisation • 2.9k views

ADD COMMENT • link updated 4.0 years ago by Rory Stark ★ 2.0k • written 4.0 years ago by m.sadman.sakib ▴ 120

score 2 · Answer 1 · 2020-05-06

2

Entering edit mode

4.0 years ago

Rory Stark ★ 2.0k

When bFullLibrarysize = TRUE, DiffBind bypasses the DESeq2 normalization and performs a simple normalization based on the relative number of reads in each of the BAM files. This is not "re-normalized" a second time when DESeq2 is invoked.

ADD COMMENT • link 4.0 years ago by Rory Stark ★ 2.0k

0

Entering edit mode

Dear Rory,

Thank you very much for your swift reply! I understand now completely! I will do my analysis accordingly. One last question regarding normalisation, if I use DESeq2 normalization for ChIPseq analysis, would it equal to DiffBind simple normalisation in terms of the results?

Also, I know until now that Diffbind cannot use multiple blocking factor, and you probably suggested(I cannot find the post now, sorry!) to use other softwares(like DESeq2 directly) to model the covariates of complex experimental designs and do differential binding analysis. But I really love using DiffBind and I think it is a fantastic package for ChIPSeq analysis, like a swiss-army knife! Will there be a future update of DiffBind that might include these functions of modelling complex experimental design?

ADD REPLY • link 4.0 years ago by m.sadman.sakib ▴ 120

0

Entering edit mode

Dear Rory, one more question. this default library size normalisation in DiffBind is done of the raw counts, or?

ADD REPLY • link 3.9 years ago by m.sadman.sakib ▴ 120

0

Entering edit mode

Default is normalize counts adjusted as follows:

 max(chip_counts - control_counts,1)

ADD REPLY • link 3.9 years ago by Rory Stark ★ 2.0k

score 0 · Answer 2 · 2020-05-06

0

Entering edit mode

4.0 years ago

Asaf 10k

The assumption behind DESeq2 normalization is that most of the entities (peaks in your case) are the same across all samples. If you think this assumption is correct then you can trust DESeq2 normalization. If you have a set of peaks that you assume will be more stable you can give this list to DESeq2 to normalize using these peaks.

ADD COMMENT • link 4.0 years ago by Asaf 10k

0

Entering edit mode

Thanks a lot for responding so quickly! Could you please elaborate on the library size normalisation question that I had? Say, in DESeq2, the library is the colSums of raw counts. But DiffBind uses probably total mapped reads from the bam file. These are fundamentally different library sizes. Then, how would the results might be affected?

ADD REPLY • link 4.0 years ago by m.sadman.sakib ▴ 120

0

Entering edit mode

DESeq2 does not normalize by library size. Roughly, it compares the values of each gene (or peak) between two samples and takes the median value as normalization factor.

ADD REPLY • link 4.0 years ago by Asaf 10k

0

Entering edit mode

I see. I might be wrong then! But this is mentioned in the DESeq2 vignette:

The DESeq2 model internally corrects for library size, so transformed or normalized values such as counts scaled by library size should not be used as input.

ADD REPLY • link 4.0 years ago by m.sadman.sakib ▴ 120

0

Entering edit mode

It does correct but not by dividing by total number of reads

ADD REPLY • link 4.0 years ago by Asaf 10k

1

Entering edit mode

See this thread: Can someone please explain in simple terms how DESeq2 works?

ADD REPLY • link 4.0 years ago by Asaf 10k

0

Entering edit mode

I will go through the link. Thanks!

ADD REPLY • link 4.0 years ago by m.sadman.sakib ▴ 120