Question

help for dba.count settings for RNAPII-ChIP using GRanges peakset

0

Entering edit mode

2.4 years ago

bertb ▴ 20

Hello,

I am trying to analyze Pol2 ChIPseq data, and the standard MACS2/peakcounting pipeline doesn't seem to capture what I'm after, since there's binding all over the promoter and coding region. That is, I'm less interested in finding the highest point and tweaking the summits factor, and more interested in the overall concentration differences (and profile changes) across these gene bodies/promoters...kind of like an RNAseq where I keep duplicates but worry more about normalization.

What I'd like to do is feed dba.count with a GRanges (transcripts) peakset, but I want to make sure I get the settings right. Would something like this work?:

DBA <- dba(sampleSheet=sample.sheet.csv)
#sampleSheet is standard with bamReads, bamControl (and macs files)
GR <- transcripts(tdxb)

counts <- dba.count(DBA, GR)

Should summits be set to FALSE? Maybe I'm missing something more fundamental.

Thank you in advance! Ben

DiffBind ChIPseq • 883 views

ADD COMMENT • link 2.4 years ago by bertb ▴ 20

score 1 · Answer 1 · 2022-02-19

Supplying the annotated regions and setting 'summits=FALSE' when calling dba.count() will, in principle, work. A few things worth noting:

The transcripts() include a lot of overlapping regions which will be merged, so the binding matrix will be constructed with fewer rows than there are regions in the transcripts().
The default behavior of filtering regions with low read counts in all samples will likely further reduce the number of regions in the final binding matrix. This is desirable as some of the annotated regions may not be enriched in your ChIP.
You can retrieve the regions actually interrogated by calling dba.peakset() with bRetrieve=TRUE after the call to dba.count():

merged_filtered_regions <- dba.peakset(counts, bRetrieve=TRUE)

Specifically, you may want to confirm that the merging/filtering process doesn't alter the distribution of widths too much. You can do this by comparing summary(width(GR)) with summary(width(merged_filter_regions))
This can also be run using the summits parameter. This will take an enriched "sample" of each transcript region for comparison and identify regions where this sample is differentially enriched. This is less likely to be thrown off by including very large regions that may contain a larger fraction of "background" reads.