Question

Normalizing Bam Files

0

Entering edit mode

10.4 years ago

ChIP ▴ 600

Hi!

A very short question, I have two BAM files coming out of a ChIP-seq experiment. File A has 29 million reads and File B has 47 million read. The problem arises when I count the tags from these two files in the genomic regions in question, because one has higher number of reads then the other.

Is their a way to normalise these two files?

I know, the normalisation can be done even after counting the tags in regions (commonly reffered as region based mnormalisation).

Thank you

chip-seq normalization • 4.2k views

ADD COMMENT • link updated 10.4 years ago by Sean Davis 26k • written 10.4 years ago by ChIP ▴ 600

score 1 · Answer 1 · 2013-12-10

1

Entering edit mode

10.4 years ago

Sean Davis 26k

You have it right. Normalization and analysis are done at the count level, not the BAM file level. There are a number of reasons for this, but the important one is that the actual counts, not just the relative counts, are important in most statistical approaches to chip-seq data. You could down-sample your larger BAM file, but that would definitely be counterproductive.

ADD COMMENT • link 10.4 years ago by Sean Davis 26k

0

Entering edit mode

Hi! so, I should count the tags in region and normalise like norm=((tags in region/length of region)/sum of all tags present in all regions). Something like this?

ADD REPLY • link 10.4 years ago by ChIP ▴ 600

1

Entering edit mode

You could try RPKM which is similar to the equation you have given above, with length of the region represented in kilobases and the "sum of all the tags" replaced by "total aligned tags (in millions)".

ADD REPLY • link 10.4 years ago by vj ▴ 520