Dear Biostars community,
I have assigned the reads in my IP.bam and Input.bam to gene coordinates of interest provided in a bed file. I have done this using bedtools coverageBED.
Now, I have the normalized read counts per specified gene bin for IP and Input. I would like to extract those genes which are enriched in my IP by calculating a log2ratio of reads_in_IP/reads_in_Input.
But the problem is that some gene coordinates (or gene bins) have zero reads assigned in the input, which makes it not possible to calculate a log2 ratio.
if I add a pseudo count of +1, this will significantly change the value of my log2ratios. then, if I were to use a threshold log2ratio = 1.5 to filter IP targeted genes, it will also extract "fake" genes that are only being selected because of the pseudo count.
for example, if for one gene bin, IP=7 reads and Input=0 reads, if I add pseudo count of 1 to both, I get, IP=8 reads, and Input=1 read. then log2ratio=3. given a threshold for enrichment as log2ratio=1.5, this bin would be selected as targeted in IP, when actually it might not be!
How can I correct for this????
I realize i will encounter this problem even if I use another method to assign the reads (such as bedinteresect, BEDOPS or samtools)
Can someone please advice me how to solve this?
Many thanks for your help.