HOMER mergePeaks totals don't add up to peaks input
Entering edit mode
6.2 years ago
steve ★ 3.2k

I am using HOMER's mergePeaks tool (homer/v4.6) to find the overlapping peaks between different bed files, for downstream visualization. However, the resulting number of peaks do not match the number of peaks that I started with. I have files that look like this:

# a set of ChIP-Seq peaks
$ wc -l Sample1.bed
106536 Sample1.bed

$ head Sample1.bed
#chr    start   end name    conc
chr1    10009   10438   chr1:10009-10438    1.93238645557258
chr1    710223  710919  chr1:710223-710919  2.6191877990312
chr1    712387  715259  chr1:712387-715259  6.36598230427597
chr1    752136  752850  chr1:752136-752850  2.77465732982152
chr1    755212  755710  chr1:755212-755710  2.15283141685661
chr1    756766  759022  chr1:756766-759022  4.05288454015793
chr1    760940  763718  chr1:760940-763718  6.44867373331963
chr1    772008  781353  chr1:772008-781353  6.86725737719796
chr1    800093  801813  chr1:800093-801813  4.41658071582566

# TSS regions from Gencode
$ wc -l gencode.bed
105785 gencode.bed

$ head gencode.bed
chr1    1868    22010
chr1    19553   39554
chr1    20266   40366
chr1    42472   62473
chr1    43048   63049
chr1    52947   72948
chr1    59090   79091
chr1    121024  141025
chr1    150445  170446
chr1    307719  327730

And the command I am using is this:

mergePeaks Sample1.bed gencode.bed -prefix mergepeaks -venn venn.txt

The stdout stream even displays the correct number of peaks:

mergePeaks Sample1.bed gencode.bed -prefix mergepeaks -venn venn.txt
    Max distance to merge: direct overlap required (-d given)
    Merging peaks...
    Comparing Sample1.bed (106535 total) and Sample1.bed (106535 total)
    Comparing Sample1.bed (106535 total) and gencode.bed (105785 total)
    Comparing gencode.bed (105785 total) and Sample1.bed (106535 total)
    Comparing gencode.bed (105785 total) and gencode.bed (105785 total)

However, the resulting files do not contain the full number of peaks.

$ cat venn.txt | cut -f3-
Total   Name
13229   gencode.bed
40836   Sample1.bed
19649   Sample1.bed|gencode.bed

13229 + 40836 + 19649 = 73714 peaks, which is less than the 100k that both sets started with. This is also reflected in the line counts for these files:

$ wc -l mergepeaks_*
  40837 mergepeaks_Sample1.bed
  19650 mergepeaks_Sample1.bed_gencode.bed
  13230 mergepeaks_gencode.bed
  73717 total

Any idea what is happening to these missing peaks?

HOMER • 4.2k views
Entering edit mode
6.2 years ago
steve ★ 3.2k

After a thorough investigation, I ran mergePeaks on the original gencode.bed file by itself and found that it contained a number of peaks that overlapped each other, thus reducing the total number true peaks by the same amount as were missing. This was corroborated by running bedtools merge -i on the file and producing output with the same number of entries. Mystery solved. The original HOMER outputs were indeed accurate.


Login before adding your answer.

Traffic: 2065 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6