I use BEDtools extensively for my data analyses and I have a confusion about the BEDtools coverage tool.
here is a summary:
Aim of exercise: to compute the % genic length targeted in my chIP (over Input).
- bin my metagenes of interest (metagenes.bed) into consecutive bins of 100bp size using BEDtools windowMaker
- compute number of reads over the gene bins for IP.bam and Input.bam respectively using BEDtools coverageBED
- in the output of coverageBED, normalize the number of reads per bin in IP & Input by CPM calculation
- calculate log2ratio of normalized IP/Input to see which gene bins have log2ratio >=1
- sum up the basepairs across gene bins which have log2 ratio >=1
- this summed up value should be the total genic length that is enriched in my IP.
I have done all these 6 steps. My question is this:
Before calculating the log2ratio (IP/INPUT), I normalise by CPM as mentioned above. BUT, do I also have to normalize by "breadth of coverage" per bin that is reported in the output of coverageBED?
If yes, how do I do this? This will significantly affect my final log2ratios and the total base pairs which I will consider as my genic length enriched in my IP.
Thank you so much in advance for your help.