Question: Comparing input read subtracted ChIP-read profiles using Deeptools v3.4.3
gravatar for Ian
4 months ago by
University of Manchester, UK
Ian5.7k wrote:

I have used the Deeptools2 packages to produce a plot comparing profiles of histone binding relative to gene start and ends. However the first sample only reaches about 2 on the Y axis, and the second lays around 0. So I am wondering if I am using it incorrectly? In particular the use of normalisation between the merged ChIP reads that are at least 3x coverage compared to the single input sample. Does readCount normalisation normalise per million reads, for example? What is the scale of the Y-axis?

I have put my steps below in case anyone can spot the flaw. Thank you for any guidance!

1) Combined triplicate mapped reads (sorted BAM) of each sample using samtools merge.
2) Use bamCompare to subtract an input read sample (sorted BAM) from the merged reads in step 1.

bamCompare --bamfile1 merged_reps_sampleA.bam --bamfile2 input.bam --scaleFactorsMethod readCount --operation subtract --binSize 50 --blackListFileName blacklist_file.bed -p 8 --effectiveGenomeSize 2900000000 --ignoreDuplicates -o

3) Use computeMatrix to create a matrix of reads relative to canonical gene annotation.

computeMatrix scale-regions -S -R knownCanonical.bed --samplesLabel SampleA SampleB -p 8 -a 5000 -b 5000 -o SampleA_vs_SampleB_matrix.gz

4) Use plotProfile to produce the plot that compares the profile of both samples.

 plotProfile --matrixFile SampleA_vs_SampleB_matrix.gz --plotType lines --numPlotsPerRow 1 --perGroup -o SampleA_vs_SampleB_matrix.gz_geneProfile_TSS_TES_5k.png --regionsLabel "HG19 UCSC canonical genes"

Resulting plot:

deeptools • 261 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by Ian5.7k
gravatar for Devon Ryan
4 months ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

You seem to be using deepTools 3.x rather than deepTools2.

The guts of read count normalization is the following bit of code:

mapped_reads = [bam1_mapped, bam2_mapped]

# new scale_factors (relative to min of two bams)
scale_factors = float(min(bam1_mapped, bam2_mapped)) / np.array(mapped_reads)

bam1_mapped and bam2_mapped contain an estimate of the number of mapped reads in each file after filtering, so the scale factors are 1.0 for the smaller file and something <1 for the larger file. For a given genomic bin, with the number of reads stored in tileCoverage[0] and tileCoverage[1]:

value1 = scale_factors[0] * tileCoverage[0]
value2 = scale_factors[1] * tileCoverage[1]
bin_value = value1 - value2

What the resulting bigWig contains and what you have plotted is bin_value, which is the "depth normalized difference in coverage" in this case. It would seem that sample B had much lower normalized depth in this region (if this is unexpected, then perhaps there are genomic regions that should be excluded (blacklisted)).

ADD COMMENTlink written 4 months ago by Devon Ryan98k

Thank you Devon and for pointing out that I should be referring to it as Deeptools2, old habits die hard. Sample B was expected to be lower. Just to clarify, the Y-axis is explained as "the depth normalized difference in coverage", which is the relative difference in coverage after normalisation? Is my use of bamCompare correct for what I am trying to achieve, i.e. showing the difference in histone binding profiles? Does "subtract" reduce the number of normalised ChIP reads (per bin) by the number of normalised input reads (per bin)?

ADD REPLYlink written 4 months ago by Ian5.7k

Yes to all of the questions in your comment.

ADD REPLYlink written 4 months ago by Devon Ryan98k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2162 users visited in the last hour