5
6
Entering edit mode
6.7 years ago

ChIP-Seq blacklist • 16k views
17
Entering edit mode
3.2 years ago
igor 13k

For those who still land on this question, there is now an updated version (v2) of the blacklists available here: https://github.com/Boyle-Lab/Blacklist/tree/master/lists

These blacklists are described in The ENCODE Blacklist: Identification of Problematic Regions of the Genome (June 2019):

Here, we define the ENCODE blacklist- a comprehensive set of regions in the human, mouse, worm, and fly genomes that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment.

6
Entering edit mode
5.7 years ago

just as an update for those in need of more recent blacklist regions (ce10, mm10, hg38), Anshul Kundaje supplies them here

1
Entering edit mode

It was reported have overlaps. I looked into the list and found the overlap: chr16 34586660 34587100 chr16 34587060 34587660

Some one have a updated or fixed one?

Thanks,

Weiyan

0
Entering edit mode

I'm afraid I don't understand the issue. Can you elaborate?

1
Entering edit mode

When I use this file as a blacklist to run deeptools-bamcompare, it was reported this list has overlaps. chr16 34586660 34587100 chr16 34587060 34587660

My understand is the bottom range overlaps with the top one. 34587060 < 34587100

2
Entering edit mode

you could use bedtools merge -i blacklist.bed to merge those overlapping regions.

0
Entering edit mode

Thank you very much! It works right now.

0
Entering edit mode

I downloaded the hg38 blacklist from Anshul Kundaje's repository as well and I found not only the overlap @weiyanjia2008 mentioned, but also chr20: 31067930 31069060 chr20: 31069000 31069280 . These two regions should also be merged. When I merged all of them (the ones from chr16 you identified and the ones I identified), it solved the problem for me. I just emailed Anshul Kundaje reporting this issue, in case he didn't notice so far, so that can be fixed and the next person that downloads it doesn't face the same problem.

0
Entering edit mode

I have been working on these things lately. Here is some more information.

0
Entering edit mode

Hi @Friederike and @venu Why are hg38 and hg19 list different? EDIT: The hg38 list seems to be smaller probably because many regions have been fixed in hg38 assembly. Does the hg38 blacklist also contain mitochondrial homologs? I believe, the homologs remain whether it be hg19 or hg38. Any insights on this. Thanks!

0
Entering edit mode

I have no deep insights into the specific differences between hg38 and hg19, but hg38 is generally considered to contain more "bait" sequences (meant to scavenge away reads that probably fell into some of the blacklisted regions before). I recommend to address Anshul Kundaje directly (and ideally share his response here).

0
Entering edit mode

Hi Can you comment on this issue:https://github.com/Boyle-Lab/Blacklist/issues/11

I really don't understand how the two files differ and which ones should be ultimately used.

2
Entering edit mode
4 months ago
ATpoint 65k

From "stable" repositories in 2022:

The general ENCODE blacklists at https://github.com/Boyle-Lab/Blacklist/raw/master/lists/

The ATAC-seq blacklists from the Buenrostro lab (mitochondrial homologs in the genome) at https://github.com/buenrostrolab/mitoblacklist/tree/master/peaks

1
Entering edit mode
6.7 years ago

There's not yet a blacklist available for each species or even each version of mouse/human. You can get the mm9 blacklist here. The equivalent for hg19 is here. There's no equivalent for hg38 and I'm not sure that lifting things over will work, though you could certainly try. We have an mm10 version of this, but I don't know that it's publicly available.

1
Entering edit mode
3.3 years ago
geocarvalho ▴ 280

If someone wants the SV excludelist, the 10x genomics has a topic about SV Calling Filter File