Question

Is it necessary to normalize sequencing data BEFORE input into MACS for ChIP-seq??

0

Entering edit mode

9.8 years ago

nash.claire ▴ 510

Hi all,

I'm a newbie and I'm trying to analyse some ChIP-seq data on Galaxy. I have a question about normalizing sequencing libraries for MACS. I have 2 samples, each with an input control and a TF ChIP and all 4 samples have a different number of aligned sequences (following bowtie alignment and removal of multi-mapping reads). For a made up example:-

Cell type A                                              Cell type B
Input control - 29 300 000                      Input control - 28 300 000
TF IP - 26 100 000                                 TF IP - 24 700 000

Now in order for me to be able to compare cell type A to cell type B, do I need to normalize the sizes of the libraries somehow before inputting into MACS? I know MACS will treat the input controls as background and therefore normalize to those for each of my sample types but is that enough? I have spoken to a bioinfomatician that has suggested doing random sampling of 3 of the samples above so that they all equal the lowest library (ie 24700000 in the case above). Is this what MACS would do anyway or is it something I should do prior to running MACS? Also what is your opinion on this type of normalization? Is it fair? What about simply reporting in terms of tags per million like we would in RNA-seq for example?

ChIP-Seq • 3.2k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.8 years ago by nash.claire ▴ 510

score 1 · Answer 1 · 2015-09-20

1

Entering edit mode

9.8 years ago

Devon Ryan 105k

MACS scales the signals to account for the differences in sequencing depth, so there's no need to subsample. That method is generally OK, though you can run into issues if the sequencing depth of one sample is much much lower than the other (to be fair, there's not much that can be done in that case).