Question

Differential binding analysis

1

Entering edit mode

10.0 years ago

Ram ▴ 190

Can anybody suggest how dba.analyze in Diffbind is performing differential analysis to retrieve differentially bound sites with treatment samples?

Thanks a lot.

ChIP-Seq • 2.4k views

ADD COMMENT • link updated 2.6 years ago by Ram 43k • written 10.0 years ago by Ram ▴ 190

0

Entering edit mode

It's using edgeR/DESeq/DESeq2/whatever-you-specify, so...

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

Yes that is already given in description. But suppose after giving two samples for two different condition, how it is managing to say that these no of sites are differentially bound !

ADD REPLY • link 10.0 years ago by Ram ▴ 190

0

Entering edit mode

It's really unclear what your question actually is. Are you confused by that possibility? There may simply be no reliable differences given the data. Do you want to know how the statistics work? If so, read the appropriate edgeR/DESeq/whatever paper.

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

score 1 · Answer 1 · 2014-05-16

Did you read the manual, read 7.3, 7.4 and 7.5

http://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf

I am just pasting when edgeR runs under the hood of dba.analyze.

When dba.analyze is invoked using the default method=DBA_EDGER, a standardized di erential analysis is performed
using the edgeR package ([4]). This section details the precise steps in that analysis. For each contrast, a separate analysis is performed. First, a matrix of counts is constructed for the contrast, with columns for all the samples in the rst group, followed by columns for all the samples in the second group. The raw read count is used for this matrix; if the bSubControl parameter is set to TRUE (as it is by default), the raw number of reads in the control sample (if available) will be subtracted (with a minimum nal read count of 1). Next the library size is computed for each sample for use in subsequent normalization. By default, this is the total number of reads in the library (calculated from the source BAM//BED le). Alternatively, if the bFullLibrarySize parameter is set to FALSE,the total number of reads in peaks (the sum of each column) is used. Note that e ective"library size (bFullLibrarySize = FALSE) may be more appropriate for situations when the overall signal (binding rate) is expected to be directly comparable between the samples.

Next comes a call to edgeR 's DGEList function. The DGEList object that results is next passed to calcNormFactors with all other parameters retained as defaults (method="TMM"), returning an updated DGEList object. This is passed to estimateCommonDisp with default parameters.

If the method is DBA_EDGER_CLASSIC, then if bTagwise is TRUE (most useful when there are at least three members
in each group of a contrast), the resulting DGEList object is then passed to estimateTagwiseDisp, with the prior set
to 50 divided by two less than the total number of samples in the contrast, and trend="none". The nal steps are to
perform testing to determine the signi cance measure of the di erences between the sample groups by calling exactTest
([5]) using the DGEList with the dispersion set based on the bTagwise parameter.

If the method is DBA_EDGER_GLM (the default), then a a design matrix is generated with two coecients (the Intercept
and one of the groups). Next estimateGLMCommonDisp is called; if bTagwise=TRUE, estimateGLMTagwiseDisp is
called as well. The model is fitted by calling glmFit, and the speci c contrast tted by calling glmLRT, specifying that
the second coecient be dropped. Finally, an exactTest ([6]) is performed, using either common or tagwise dispersion
depending on the value specified for bTagwise.

This final DGEList for contrast n is stored in the DBA object as DBA$contrasts[[n]]$edgeR
and may be examined and manipulated directly for further customization. Note however that if you wish to use this
object directly with edgeR functions, then the bReduceObjects parameter should be set to FALSE, otherwise the default
value of TRUE will result in essential object elds being stripped.

If a blocking factor has been added to the contrast, an additional edgeR analysis is carried out. This follows the
DBA_EDGER_GLM case detailed above, except a more complex design matrix is generated that includes all the unique
values for the blocking factor. These coecients are all included in the glmLRT call. The resultant object is accessible as
DBA$contrasts[[n]]$edgeR$block.