I want to normalise my paired-end ChIP-seq data (4 samples treated, 4 samples control) to reads per million (RPM). So far my calculation takes the number of reads mapped to a position, divides it by the total number of mapped reads in the sample, and then multiples this by one million. My question is, should the number of mapped paired-end reads be counted as 1 or 2 in the calculation? I believe Samtools reports the total number of mapped reads as half the actual number of paired-end reads which map?
My current pseudo script for normalising to RPM:
- mappedReads = samtools view -c -F 4 input.bam
- scalingFactor = 1000000 / mappedReads
- bedtools genomecov -ibam input.bam -bg - scale scalingFactor -g chrom.sizes > output.bedGraph
- bedGraphToBigWig input.bedGraph chrom.sizes output.bigWig