bamCoverage bin size for narrow and broad histone marks
1
0
Entering edit mode
19 months ago
Marco Pannone ▴ 490

Hey everybody!

I am performing ChIP-seq data analysis and I am currently in the process of generating .bigWig files using bamCoveragefrom the deepTools suite. I have data for both narrow histone marks, such as H3K4me3, but also broader marks as H3K36me3.

I am aware of the fact that probably I should choose appropriately an integer value for --binSizeand --smoothLengthparameters, but I am not exactly sure what values are the most appropriate for both cases.

Any opinion would be very appreciated!

Thanks :)

ChIP-Seq deeptools • 1.8k views
1
Entering edit mode
19 months ago
2nelly ▴ 310

You should not worry too much about the binSize. This will just define the resolution that your data will be visualized. Obviously, you are not gonna use sizes like 100kb or 1Mb.

Ideally, you can play with values between 1kb-10kb (like 1000-5000-10000). A smaller bin increases the computational time. For TFs, binSize can be reduced further. Regarding the smoothLength, in my humble opinion I suggest something between x3-x5 times the binSize. Most of the times I use x4.

For instance:

1kb binSize --> smoothLength can be 4kb

10kb binSize --> smoothLength can be 40kb.

Since you have Chip-seq data, I would propose to remove the duplicates and keep only the uniquely mapped reads from your alignment file. This will further improve the visualization and data interpretation.

2
Entering edit mode

I disagree here as the binsize in bamCoverage defines how many adjacent bases will be averaged into a single value when making the bigwig file. The default is 10 bases (or 50, I would need to check?), I personally always use 1bp as (to my eye) larger bins look ugly on a genome browser and you lose the per-bp precision which can be important depending on your application. Certainly do not use anything big, for sure not 1kb, that would interfere with the visualization as the signal over an entire 1kb interval would be compressed into a single value.

0
Entering edit mode

Thanks for the reply! So you would suggest also not set up any value for smoothLength? Also, when it comes to bamCompare instead, in case I want to normalize the ChIP sample over its input, would you still recommend to select binSize 1bp and no smoothLength?

0
Entering edit mode

I personally never used smoothing here. I usually use the bigwigs then in R to do whatever plot I want and then do smoothing (if necessary) during that process, but I keep the original bigwig file with the actual raw data. I also never do input normalization as there is (to my taste) no reliable method for this out there, I only use the input to call peaks against.

0
Entering edit mode

I agree, I also mostly use input normalization only during the peak calling process, while for genome browser visualization I prefer to avoid it (or I would not be able to get an idea of my background signal). I am currently proceeding with the default binSize of 50bp and see how it looks like. Thanks again for your opinion, very appreciated :)

0
Entering edit mode

I don't think that 1 base precision is necessary for histone marks. Of course, lower bins are more useful for TFs as I said. Use of such a low bin in broad signals will make graph look oversegmented. Certainty, the selection of binSize is based on the nature of the experiment and 1bp for histone mark is not ideal according to my opinion. Visualization of 1kb resolution should work well. In any case, you can try as many times as you want with different parameters and see what fits better to your needs.

I see no reason to use bigwig for peak calling and not directly the alignment file. I presume you want the bigwig visualization for aesthetic purposes (article figure etc).

0
Entering edit mode

Yes indeed, I want to produce .bigWig files simply for inspecting signal enrichment in the genome browser and take snapshots or interesting regions, I am not producing such files as a preliminary step to peak calling.

0
Entering edit mode

Thence, I see no reason why to spend time and storage space to do very high resolution. If you want to show large regions of 1M etc then go for higher resolution as there will be no actual difference. If you want to show small areas of gene size then invest on time and space to produce higher resolution. There is no standard way. You always have to adjust according to your needs.

2
Entering edit mode

I strongly encourage you to not use a 1kb bin size for something like H3K4me3, your resolution there will be terrible. For most cases I wouldn't go over 50.

0
Entering edit mode

Thanks for your answer! I have already filtered out of my .bam files all the unmapping reads, duplicates and kept only the best alignment for the multimappers. The default parameters use a bin size of only 50bp, so maybe indeed I should go for larger bins as suggested by you.

I guess it would be a lot to play with regarding binSize and smoothLength until I get a satisfactory track visualization on the genome browser.