Chip-seq analysis Diffbind
1
0
Entering edit mode
16 hours ago
Irene ▴ 10

Hi everyone.

I’m performing an H3K27ac ChIP-seq analysis on mouse samples from an early-stage model of a neurodegenerative disease. From what I’m seeing, I’m not finding many differences between healthy and diseased mice (only two biological replicates).

In my ChIP-seq analysis with DiffBind, when I reach the dba.analyze step and choose the method, I’ve run into a situation that I don’t fully understand. When I set method = DBA_ALL_METHODS, the analysis runs correctly and reports a total of 1,599 differential peaks using dba.edgeR. However, when I use method = DBA_EDGER, I get an error: “diffbind error in qr.default(contrast): na/nan/inf in foreign function call (arg 1)”. I don’t understand why this error appears, since with DBA_ALL_METHODS the error doesn’t occur and it still reports results for edgeR.

Related to this, I also tried generating MA plots, and the outcome seems quite odd, especially considering that edgeR uses stricter normalization conditions than DESeq2. My FRiP values are very low, as I mentioned… In summary, I’m not sure whether I can trust the results I’m getting, even though I believe I’m following the methodology correctly.

Given that the differences seem so subtle, how can I reliably evaluate them in this case? Thank you very much for any insights!


DIFFBIND ANALYSIS

# BUILT METADATA
> metadata <- data.frame(SampleID = samplesID, \
                       Factor = factor, \
                       Tissue = tissue, \
                       Condition = condition, \
                       Replicate = replicate, \
                       bamReads = bamReads, \
                       ControlID = controlID, \
                       bamControl = bamControl, \
                       Peaks = peaks, \
                       PeakCaller = peakcaller, \
                       peakFormat = peakformat, \
                       ScoreCol = 5 \
)

#  CREATE DBA OBJECT
> dbObj <- dba(sampleSheet=metadata) 

# CREATE CONSENSUS PEAKSET
> consensusPeaks <- dba.peakset(dbObj, consensus = DBA_CONDITION, minOverlap = 2) \
> consensus <- dba(consensusPeaks, consensusPeaks$masks$Consensus, minOverlap=1) \
> consensus.peaks <- dba.peakset(consensus, bRetrieve=TRUE)

# COUNT
> dbObj.counted <-dba.count(dbObj, peaks = consensus.peaks, summits = 500, bParallel = TRUE, bSubControl = FALSE, minOverlap=2) 

# CONTRAST
> dbObj.counted <- dba.contrast(dbObj.counted, categories = DBA_CONDITION, minMembers = 2, reorderMeta=list(Condition=c("wt","htt")))

# ANALYZE
> dbObj.counted <- dba.analyze(dbObj.counted, method = DBA_ALL_METHODS)

Applying Blacklist/Greylists... \
Genome detected: Mmusculus.UCSC.mm10 \
Applying blacklist... \
Removed: 43 of 43259 intervals. \
Counting control reads for greylist... \
Building greylist:  \/mnt/Disco2/IreneCE/chipseq/results/2_bowtie2/1_filter/wt_input_no_chrM_chrUn_chrY_q20_sorted_nd_sorted_NoBlackList_sorted.bam
coverage: 288256 bp (0.01%) \
Building greylist:  \/mnt/Disco2/IreneCE/chipseq/results/2_bowtie2/1_filter/htt_input_no_chrM_chrUn_chrY_q20_sorted_nd_sorted_NoBlackList_sorted.bam
coverage: 510464 bp (0.02%) \
wt_input: 110 ranges, 288256 bases \
htt_input: 167 ranges, 510464 bases \
Master greylist: 203 ranges, 585216 bases \
Removed: 42 of 43216 intervals. \
Removed 85 (of 43259) consensus peaks. \
Normalize edgeR with defaults... \
Normalize DESeq2 with defaults... \
Analyzing... \
gene-wise dispersion estimates \
mean-dispersion relationship \
final dispersion estimates

# MA PLOT

> par(mfrow=c(3,1)) \
> dba.plotMA(dbObj.counted,bNormalized=FALSE) \
> dba.plotMA(dbObj.counted, method=DBA_DESEQ2) \
> dba.plotMA(dbObj.counted, method=DBA_EDGER)

I use the broadPeak option of MACS3, and i have really big peaks, i do not know if i should use the summit option

Min. 1st Qu. Median Mean 3rd Qu. Max. 266 1213 1959 2828 3290 90541

PCA plot

MA plot

FRiP values

Chipseq Diffbind • 91 views
ADD COMMENT
0
Entering edit mode
2 hours ago
ATpoint 90k

I think the results are expected. a) n=2 is underpowered, for any assay and in particular for ChIP-seq which has considerable noise. b) you have spread in the pink gropup, therefore, together with a) are not getting differential calls.

especially considering that edgeR uses stricter normalization conditions than DESeq2

Not sure why you say that. In general, and in my experience, if done properly the two normalization methods are almost identical in terms of output. I am not familiar with the DiffBind internals and how the run things and how they prefilter (if at all). Because, for whatever reason the edgeR method as implemented in DiffBind gets the normalization wrong. The baseline of the third MA-plot is clearly off (too low) hence all these differential calls at the bottom of the plot are false-positives. The second plot from DESeq2 makes perfect sense, it's well centered and due to the noise (probably) the logFCs are all shrunken towards zero. Not sure you can do much about that, other than relaxing the FDR cutoff or taking the top-X regions by nominal P-Value and see whether these enrich near interesting genes. The PCA clearly tells that there is a group effect, but the noise in pink is considerable. The FRiPs are not great but not terrible either.

ADD COMMENT

Login before adding your answer.

Traffic: 4225 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6