Although the Input control is commonly used for in ChIP-seq. analysis (e.g. for normalization, as background for peak calling), it seems hard to find an easily understandable description explaining how it can be properly prepared (Before sequencing) and used in later data analysis stage. When calculating ChIP enrichment of a region of interest in ChIP-qPCR experiment, the thing you need is just the relative amount of cell no. or DNA used for Input control and the IP experiment. When doing ChIP-seq., the steps are much more complex and I don't really understand what to do with the Input control.
I want to examine the average signal strength of the region(s) of interest against the background using bam files (Signal in Input control). The main question is that "Which steps in the data analysis process can take care of the differences of sequencing raw reads output between Input control and the IP sample, so that one can tell if the signal at the particular regions is higher or lower than the background?"
I ask that because I've supposed that signal strength can only be compared when the total amount of reads in the Input & IP sample are the same or had been scaled to the same. Is the normalization (e.g. RPGC: reads per genomic content; RPKM: reads per kilobase per million reads) or other steps taking care of this? Other than plotting the signal profile, do I need to do some steps to address the read no. differences before peak calling?
Another thing is that, is it normal to get different no. of sequencing Raw Reads (e.g. I've got 50% for the largest different among my samples) even same amount of DNA Seq. libraries had been subjected to sequencing?
Thanks very much!!
Kylie (Beginner of NGS)