Hello,
I am trying to analyze Pol2 ChIPseq data, and the standard MACS2/peakcounting pipeline doesn't seem to capture what I'm after, since there's binding all over the promoter and coding region. That is, I'm less interested in finding the highest point and tweaking the summits factor, and more interested in the overall concentration differences (and profile changes) across these gene bodies/promoters...kind of like an RNAseq where I keep duplicates but worry more about normalization.
What I'd like to do is feed dba.count
with a GRanges (transcripts) peakset, but I want to make sure I get the settings right. Would something like this work?:
DBA <- dba(sampleSheet=sample.sheet.csv)
#sampleSheet is standard with bamReads, bamControl (and macs files)
GR <- transcripts(tdxb)
counts <- dba.count(DBA, GR)
Should summits
be set to FALSE
? Maybe I'm missing something more fundamental.
Thank you in advance! Ben
Ok, thanks. That's very helpful.
When it comes to normalization, I've reviewed the user guide, as well as your Bioconductor workshops in 2020, and I'm still just having a bit of difficulty understanding the difference between the default full library normalization ("lib"), and background normalization (normalize=DBA_NORM_NATIVE, background=TRUE). Technically, I understand the differences in that background normalization utilizes native normalization methods like taking the median of modes from large bins, rather than just scaling for depth.
But strategically, both seem to aim to "level" the background between all samples, based on the assumption that backgrounds should be largely similar across samples. I guess my question is, why might one choose background normalization instead of the default ("lib")? Running both normalizations, I do seem to be getting differences in the number of significantly bound sites in my analyses.
Is there a way to view the binding matrix after each normalization method to see how these normalization methods are affecting specific regions?
Thank you!