What could be a reason for peaks called via Macs2 not appearing as real peaks in IGV?
2.1 years ago
Sam ▴ 250

I have called (broad) peaks with macs2

macs2 callpeak -t A1.bam -c WT1.bam -g mm --broad

One of the found peaks appears to have a fold change of about 10

chr start       end        length   pileup  -LOG10(pv)  fold_enrichment -LOG10(qv)  
14  19415551    19419692    4142    37.35   29.87076    9.84997         24.22493

However, when I load this location in the IGV, both the control and the treatment look more or less the same.

In broad peak calling, the fold as calculated as the mean across the whole peak region, so it's a bit hard to see - but nevertheless it doesn't look like a fold of ~10 .

(Since macs2 omits duplicates, before loading IGV, I have marked duplicates with picard markduplicates, and filtered duplicates in IGV - so that is not the issue. When using markduplicates, non-primary alignments were not taken into account - just as in macs2 )

Would be glad for ideas what could the issue be.

macs2 igv • 1.0k views
The number of mismatches shown in the IGV tracks is ringing alarm bells for me, especially as the patterns differ between A1 and WT. Is this supposed to be an isogenic background?

Are you definately looking on the right genome?

2.1 years ago
ATpoint 62k

This region is a common source of error. It is included in the ENCODE NGS blacklist for the mm10 genome. This can for example be a low-complexity or repetitive region prone to attract false alignments from reads that actually originated from other parts of the genome.

If you load that linked BED file below into the browser you see it is almost a 100% overlap.

Here is a link to the mm10 blacklist in BED format: http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/mm10-mouse/mm10.blacklist.bed.gz

See also: https://www.nature.com/articles/s41598-019-45839-z

Also, as you have noted, the data from the two samples should actually be normalized.


