Question

RPM in ChIP-seq analysis

0

Entering edit mode

2.3 years ago

donny.dw ▴ 20

enter image description here

I am trying to understand RPM in ChIP-seq analysis. The definition of RPM is number of reads mapped mapped to a gene divide Total mapped reads. When I generate the bedgraph without "gene size", the values are too small.

scalingFactor = 1000000 / mappedReads
bedtools genomecov -ibam input.bam -bg - scale scalingFactor -g chrom.sizes > output.bedGraph

chr1    17441   17450   0.0621333
chr1    17450   17451   0.124267
chr1    17451   17592   0.1864
chr1    17592   17597   0.124267
chr1    17597   17602   0.0621333
chr1    139719  139766  0.0621333
chr1    139766  139870  0.124267
chr1    139870  139917  0.0621333
chr1    181344  181443  0.0621333

Do I need multiply some value to get RPM? What is the "gene size" I should use? 1000 or 10000?

RPM ChIPseq • 1.5k views

ADD COMMENT • link updated 2.3 years ago by seidel 11k • written 2.3 years ago by donny.dw ▴ 20

0

Entering edit mode

ChIP-seq has no metric "gene size". That is something from RNA-seq if you want to correct for the fact that longer genes get more counts. It is not relevant in ChIP-seq, and in general RPM is outdated and performs poorly. Use this approach

=> ATAC-seq sample normalization (same applies to ChIP-seq)

here is why RPM is bad:

=> TMM-Normalization

ADD REPLY • link 2.3 years ago by ATpoint 82k

score 0 · Answer 1 · 2021-12-31

0

Entering edit mode

2.3 years ago

seidel 11k

RPM in ChIP Seq analysis refers to Reads Per Million depth per base. You sort of have to silently tack that bit on in your mind when you read the unit. The feature size is 1.

the values are too small.

Have you confirmed this? i.e. looked at the depth over a base per million mapped reads?

ADD COMMENT • link 2.3 years ago by seidel 11k