Question: Deeptools sample scaling
gravatar for Constantine
3.0 years ago by
Constantine210 wrote:


I have downloaded several ChIP-seq data (histone marks) and I'm trying to plot their coverage over the TSS

Since every file comes from a different sequencing experiment the sequencing depth varies from one sample to another. Therefore, what I'm trying to do is scale all the samples by a specific value.

The deeptools package offers that via the bamCompare --scaleFactor function. Is it sensible to calculate the mean coverage for each sample and then apply a scaling factor to normalise all the bam files to each other?

Moreover, I'm having trouble parsing the matrix file from the computeMatrix function into R. If anyone is familiar with deeptools any explanations with regard to what the matrix files represent would be much appreciated


sequencing chip-seq deeptools • 2.5k views
ADD COMMENTlink modified 2.3 years ago by DataFanatic110 • written 3.0 years ago by Constantine210
gravatar for Fidel
3.0 years ago by
Fidel1.8k wrote:

bamCompare is used when you have two bam files. For ChIP-seq there is normally the ChIP sample and the so called input, bamCompare is used to compute the log2ratio or the difference. If you have for all your histone marks the ChIP and input, you can use bamCompare which will produce normalized bigwig files

If you don't have an input, you can use bamCoverage which can normalize the values using two methods: normalizeTo1x and normalizeUsingRPKM. Either method will produce output that is comparable across samples.

You can write to the deepTools mailing list to get support on the matrix issue (


ADD COMMENTlink written 3.0 years ago by Fidel1.8k
gravatar for  DataFanatic
2.3 years ago by
DataFanatic110 wrote:

Can someone please explain how the normalize To 1x and normalize using RPKM normalization is accomplished in bamCoverage ?? Please add a detail description, Thanks!

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by DataFanatic110

It's best to post things like this as a new question.

Anyway, these normalization methods basically come up with a number against which the signal is multiplied. For the "normalize to 1x" method, the total number of alignments that will be used in the file and their lengths are estimated. These values are then multiplied together and divided by the effective genome size that you input. That yields the average coverage that's expected in a particular sample. The signal in each bin is then divided by it (technically it's multiplied by 1/that number, but the results are the same).

For RPKM, the procedure is similar, with the number of alignments in the file estimated, the results divided by 1 million and multiplied by the bin size in kb). That scale factor is then used as above in each bin.

You can see the full code for this in in the source code (in particular, see the get_scale_factor function).

ADD REPLYlink written 2.3 years ago by Devon Ryan86k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1643 users visited in the last hour