Normalizing Bam Files
1
0
Entering edit mode
10.4 years ago
ChIP ▴ 600

Hi!

A very short question, I have two BAM files coming out of a ChIP-seq experiment. File A has 29 million reads and File B has 47 million read. The problem arises when I count the tags from these two files in the genomic regions in question, because one has higher number of reads then the other.

Is their a way to normalise these two files?

I know, the normalisation can be done even after counting the tags in regions (commonly reffered as region based mnormalisation).

Thank you

chip-seq normalization • 4.2k views
ADD COMMENT
1
Entering edit mode
10.4 years ago

You have it right. Normalization and analysis are done at the count level, not the BAM file level. There are a number of reasons for this, but the important one is that the actual counts, not just the relative counts, are important in most statistical approaches to chip-seq data. You could down-sample your larger BAM file, but that would definitely be counterproductive.

ADD COMMENT
0
Entering edit mode

Hi! so, I should count the tags in region and normalise like norm=((tags in region/length of region)/sum of all tags present in all regions). Something like this?

ADD REPLY
1
Entering edit mode

You could try RPKM which is similar to the equation you have given above, with length of the region represented in kilobases and the "sum of all the tags" replaced by "total aligned tags (in millions)".

ADD REPLY

Login before adding your answer.

Traffic: 2819 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6