I'm a newbie and I'm trying to analyse some ChIP-seq data on Galaxy. I have a question about normalizing sequencing libraries for MACS. I have 2 samples, each with an input control and a TF ChIP and all 4 samples have a different number of aligned sequences (following bowtie alignment and removal of multi-mapping reads). For a made up example:-
Cell type A Cell type B
Input control - 29 300 000 Input control - 28 300 000
TF IP - 26 100 000 TF IP - 24 700 000
Now in order for me to be able to compare cell type A to cell type B, do I need to normalize the sizes of the libraries somehow before inputting into MACS? I know MACS will treat the input controls as background and therefore normalize to those for each of my sample types but is that enough? I have spoken to a bioinfomatician that has suggested doing random sampling of 3 of the samples above so that they all equal the lowest library (ie 24700000 in the case above). Is this what MACS would do anyway or is it something I should do prior to running MACS? Also what is your opinion on this type of normalization? Is it fair? What about simply reporting in terms of tags per million like we would in RNA-seq for example??