Question

How To Subtract A Coverage Track Of Mock Ip From A Set Of Chip Tracks?

1

Entering edit mode

10.2 years ago

alexxhardt ▴ 10

I have a set of ChIP data, most of them with a transcription factor, but also control data, which was not the raw input chromatin, but instead they performed a "mock" ChIP, i.e. pulled out chromatin with a bead on which no antibodies were attached.

This mock track shows a strong co-movement with the actual ChIP tracks, which made me wonder if there is a way to correct this bias in the actual ChIP tracks. My first naive idea would be to just subtract the mock track from the ChIP tracks. This is of course not smart, but hopefully clarifies what I'm looking for.

Does anyone know of literature that addresses this problem?

• 3.8k views

ADD COMMENT • link updated 7.9 years ago by Biostar 20 • written 10.2 years ago by alexxhardt ▴ 10

score 4 · Answer 1 · 2014-02-06

There are a couple of issues at play here. Are you looking to adjust tracks for visual inspection? Or are you looking to quantitatively evaluate peaks? Is your data in the form of coverage? Normalized coverage? (i.e. RPM for some interval size). Are you using peak finding software? Visually adjusting tracks can be anything which makes sense and helps you form an intuition for further analysis. This includes what you suggested: if you subtract the mock IP RPM signal from the Test IP rpm signal, the result would hopefully be peaks of enrichment. Alternatively, if you divide Test/mock, you would also hope to see peaks, but in my experience this is usually more difficult to navigate than you might think, because it's easy to get ratios from division of small numbers, so the data stream tends to be full of peaks, and focusing on the ones you're looking for is not always easy.

If you're looking to evaluate the peaks numerically, you have several options. For example, MACS will allow you to compare two IPs. You can evaluate peaks found with Test IP as treatment relative to Mock IP as control. (the more common "control" would be input chromatin, but you said you don't have that so it won't be part of the analysis unfortunately). You can also evaluate each IP independently (since MACS will use a regional background model that you can adjust if you choose). Given these result sets you can compare overlaps (using bedtools or GenomicRanges in R) to see if there are any peaks from the IP/mock comparison that survive subtraction of Mock alone. Lastly, also as Jason suggested, you can use edgeR or DESeq to statistically evaluate the difference between your mock and test IP peaks. There are a couple of ways to approach this based on what specific questions you want to answer, but if your test IP and Mock IP are "tracking closely" as you describe, you could combine the IPs, find peaks (e.g. using MACS2), then take your peaks (or just the summits +/- some interval) and use those as "gene definitions", i.e. as intervals to go back and map read counts to from (1) your test IP and (2) your Mock IP (hopefully you have replicates?). These interval counts are then the input for edgeR analysis, and should allow you to numerically gauge "enrichment" of your Test over your Mock. But for this analysis there are various assumptions at play it really depends on your data set. It would be useful to simply look at some distributions associated with your data (i.e. make plots and examine all the stats returned by MACS for your peaks for each data set, etc.).

score 1 · Answer 2 · 2014-02-06

I know of people who have used edgeR in order to normalize ChIP-seq data with a mock IP or untagged IP reference. edgeR is a package in R that normalizes tag count data. I wouldn't simply subtract the mock IP reads from the IP reads for each gene or location. It would probably be better to take a ratio of IP over mock IP. HOMER may also have a function that can adjust for having the mock IP. HOMER is a command line program that has many functions that are of use to people analyzing ChIP-seq or RNA-seq data (i.e. detecting peaks, RPKM, composite plots, and much more) . You could actually use HOMER to align your tag counts to genes then normalize in edgeR if you want too. Or like I mentioned, since HOMER can call peaks, if you can call peaks and normalize to the mock IP that may be really useful too.