Can anyone recommend some sources for reliable DNA copy number blacklists for excluding unreliable regions in the genome prior to copy number analysis for human genome?
Can anyone recommend some sources for reliable DNA copy number blacklists for excluding unreliable regions in the genome prior to copy number analysis for human genome?
Hi,
For human hg38 copy number analysis, the go-to exclusion sets focus on mappability issues, repeats, and artifacts. Here's what I'd recommend—stick to these well-maintained ones:
ENCODE/DAC Blacklist (unified hg38): Comprehensive for high-signal artifact regions. Download BED: ENCODE portal. Also available via Boyle Lab GitHub.
Duke Excluded Regions (lifted to hg38): Filters out low-mappability areas from ENCODE pilots. Grab from UCSC (hg19 base, liftOver to hg38 via UCSC tool): wgEncodeDukeMapabilityExcluded.
Unified Blacklist (Stuart Lab): Merges ENCODE + Duke for hg38, great for CNV pipelines. Direct BED: stuartlab.org.
Combine via bedtools merge and intersect with your bins. A recent 2025 review confirms ENCODE's still the gold standard, but test overlaps for your data. Avoid over-filtering segdups if your assay handles them.
Kevin
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Please include target species you are working on. These availability of these kinds of resources will be quite variable depending on how established your model system is. If you work on a non-model system, you might have to compile this list yourself, or filter results accordingly. For example, immune genes like MHC are particularly difficult to get an accurate copy number for without careful experimental design.
Human genome, ideally hg38