Question

Comparison of ChIPseq enrichment in specific regions

0

Entering edit mode

7.0 years ago

Roman Hillje ▴ 90

We performed ChIPseq experiments on two treatment groups with ~8 replicates each. We did some genome-wide comparisons, called peaks, used the DiffBind R package for analysis of differentially enriched regions, etc. Finally, we have come up with some genes of interest of which we would like to compare the signal around the TSS between the two groups.

I have done this using deepTools multiBamSummary, providing a BED file with the regions of interest. However, as far as I understood, this does not normalize the data (which would makes sense since its made to report the read coverage).

Do you know a tool that can do the same, just with a normalization step, or would you do something else instead? INPUT is not available.

Thanks!

ChIP-Seq deeptools • 2.2k views

ADD COMMENT • link updated 7.0 years ago by Rory Stark ★ 2.0k • written 7.0 years ago by Roman Hillje ▴ 90

score 1 · Answer 1 · 2017-04-25

1

Entering edit mode

7.0 years ago

Devon Ryan 104k

You'll want to use bamCoverage to create 1x normalized bigWig files and then either use multiBigwigSummary or computeMatrix, depending on your exact needs. Those will both then give you normalized values.

ADD COMMENT • link 7.0 years ago by Devon Ryan 104k

0

Entering edit mode

You are the best, thanks! :)

ADD REPLY • link 7.0 years ago by Roman Hillje ▴ 90

score 1 · Answer 2 · 2017-04-25

1

Entering edit mode

7.0 years ago

Rory Stark ★ 2.0k

You can do this right in DiffBind. The counting function, dba.count(), has a peaks parameter that you can use to pass in your BED file directly. Then you can set the normalized read score using the score parameter (default is to use TMM normalization from the edgeR package, but you can also use RPKM etc.). Input is not required.

ADD COMMENT • link 7.0 years ago by Rory Stark ★ 2.0k

0

Entering edit mode

That would make things even easier. Unfortunately I wasn't able to get it running yet. As said above, I have 16 samples in total. When defining the peaks parameter in dba.count() as the BED file I have, I get an error (you will see it later). So I made it a bit simpler and loaded the peaks as a data frame and limited it to 4 regions:

    V1        V2        V3
2 chr1  67167872  67175304
3 chr1  93350930  93356987
4 chr2  31323391  31330158
5 chr4 150417259 150420073

Then, I ran:

counts=dba.count(sample, score=DBA_SCORE_TMM_READS_FULL, peaks=peaks, bParallel=FALSE)

After all the samples are processed, this is the error I see:

Error in if (is.unsorted(unique(pv$vectors[, 1]))) { :
  missing value where TRUE/FALSE needed

This is the same error I also received when using the BED file instead of a data frame. Any idea what this could be caused by? I'll keep trying in the meantime.

ADD REPLY • link 7.0 years ago by Roman Hillje ▴ 90