Remove reads counts overlapping with blacklist regions in BAM files
2
6
Entering edit mode
5.8 years ago
armandorp ▴ 60

Hi all,

I am preprocessing some bam files for differential binding analysis. I have read that some people recommend removing blacklist regions (human) before to do the peak calling. Do you have any suggestions how directly extract the read counts in these regions using maybe samtools?

Thanks, Armando

ChIP-Seq samtools • 8.7k views
3
Entering edit mode

Use intersectBed with -v param

3
Entering edit mode
5.4 years ago

Good suggestions.

Keep in mind - when you perform a cross strand correlation analysis to assess peak quality in your data, if you do not exclude blacklisted reads prior to this analysis (which generally uses aligned reads or bed converted intervals from these reads), then you will include these regions in your analysis which may skew the results one way or the other.

Devon is right - the time savings will be nice to call peaks without this step and peak calling shouldn't really be affected much by these regions, depending on which peak caller you decide to use.

Just have a look at a few of the blacklisted regions after your alignment to see what kind of signal you're getting there, and then choose whether to use filtered alignment files or not for whichever downstream analyses (following Sukhdeep Singh's suggestion).

Cheers Paul

2
Entering edit mode

Good point regarding cross correlation with blacklisted regions included. Someone in our group mention maybe adding a cross correlation tools to deepTools. That'd properly ignore blacklisted regions (though we'd have to think about edge effects). Maybe I'll write something like that.

2
Entering edit mode
5.8 years ago

Normally you don't need to modify or do anything with the alignments. After calling peaks, exclude regions that overlap a blacklisted region (use bedtools).

Relatedly, deepTools has an option to ignore reads/signal in blacklisted regions as of version 2.2 I think. This won't do what you need in this particular case, but for other ChIP-seq related things this is useful.

0
Entering edit mode

Thanks Devon. Do you mean to use deeptools before calling peaks to get a kind of marked BAM file for these regions? So you don´t recommend to remove blacklisted regions in the alignments, but the reads/ high-signal associated with blacklisted regions can affect the peak calling with MACS2.

0
Entering edit mode

I mean what I wrote. Don't bother excluding reads before peak calling (yes, you can do this with bamutils or bedtools if you want to wait a while), just exclude the peaks afterward. You might get poorer power in the direct vicinity of blacklisted regions, but that'll be a minor effect given the time savings.

0
Entering edit mode

Devon, you are not answering the question: can " the reads/ high-signal associated with blacklisted regions can affect the peak calling with MACS2? Also
I tried removing the blacklisted regions using bedops so I had do bam to bed file conversion, after filtering am unable to convert the files back to bam am using bedtools to do the file conversions. can anyone please suggest a way to filter directly from the bam files, bedops does not take bam files and the blacklisted regions are in bed format

4
Entering edit mode

Use bedtools:

bedtools intersect -v -abam FILE.BAM -b BLACKLIST.BED > FILTERED.BAM

1
Entering edit mode

Blacklisted regions don't normally do much except cause false-positive peaks (inside the blacklisted regions). This isn't always the case, but if you have nice large peaks versus input then it will be. If you have really weak enrichment (for whatever reason) then you might gain something from stripping out reads in blacklisted regions (maybe I should write a little tool for that so it can be multithreaded).