Homer finds same peak multiple times
0
1
Entering edit mode
2.6 years ago

I am using Homer to identify peaks in RNA-seq data and then determine differential expression by counting reads per peak. Homer has a lovely package that does just this: getDifferentialPeaksReplicates.pl. The issue is that for some reason Homer returns the same peak multiple times in its final output (Bonus question: how does Homer produce different statistics for the same peak?). Here is the code I am using:

getDifferentialPeaksReplicates.pl \
-genome mm10 \
-style factor \
-size 25 \
-minDist 1 \
-fdr 0.05 \
-P 0.1 \
-all \
-t \
${dir}CS1/CS1_R1_TagDirectory \
${dir}CS2/CS2_R1_TagDirectory \
${dir}CS5/CS5_R1_TagDirectory \
${dir}CS6/CS6_R1_TagDirectory \
-b \
${dir}CS3/CS3_R1_TagDirectory \
${dir}CS4/CS4_R1_TagDirectory \
${dir}CS7/CS7_R1_TagDirectory \
${dir}CS8/CS8_R1_TagDirectory \
-i \
${dir}CS9/CS9_R1_TagDirectory \
${dir}CS10/CS10_R1_TagDirectory \
> ${dir}/CS_SampleTagDirectories/difPeaks.txt

Here is an example of the output:

#cmd=getDifferentialPeaksReplicates.pl -genome mm10 -style factor -size 25 -minDist 1-FDR 0.05 -P 0.1 -all -t /media/sf_UbuntuSharing/2104UNHX-0846/CS3/CS3_R1_TagDirectory /media/sf_UbuntuSharing/2104UNHX-0846/CS4/CS4_R1_TagDirectory /media/sf_UbuntuSharing/2104UNHX-0846/CS7/CS7_R1_TagDirectory /media/sf_UbuntuSharing/2104UNHX-0846/CS8/CS8_R1_TagDirectory -b /media/sf_UbuntuSharing/2104UNHX-0846/CS1/CS1_R1_TagDirectory /media/sf_UbuntuSharing/2104UNHX-0846/CS2/CS2_R1_TagDirectory /media/sf_UbuntuSharing/2104UNHX-0846/CS5/CS5_R1_TagDirectory /media/sf_UbuntuSharing/2104UNHX-0846/CS6/CS6_R1_TagDirectory -i /media/sf_UbuntuSharing/2104UNHX-0846/CS9/CS9_R1_TagDirectory /media/sf_UbuntuSharing/2104UNHX-0846/CS10/CS10_R1_TagDirectory|PeakID (cmd=annotatePeaks.pl 0.943238134802417.peaks mm10 -d /media/sf_UbuntuSharing/2104UNHX-0846/CS1/CS1_R1_TagDirectory /media/sf_UbuntuSharing/2104UNHX-0846/CS2/CS2_R1_TagDirectory /media/sf_UbuntuSharing/2104UNHX-0846/CS5/CS5_R1_TagDirectory /media/sf_UbuntuSharing/2104UNHX-0846/CS6/CS6_R1_TagDirectory /media/sf_UbuntuSharing/2104UNHX-0846/CS3/CS3_R1_TagDirectory /media/sf_UbuntuSharing/2104UNHX-0846/CS4/CS4_R1_TagDirectory /media/sf_UbuntuSharing/2104UNHX-0846/CS7/CS7_R1_TagDirectory /media/sf_UbuntuSharing/2104UNHX-0846/CS8/CS8_R1_TagDirectory -raw) (cmd=getDiffExpression.pl 0.943238134802417.raw.txt bg bg bg bg target target target target -norm2total -DESeq2 -fdr 0.05 -log2fold 1 -export 0.943238134802417)   Chr Start   End Strand  Peak Score  Focus Ratio/Region Size Annotation  Detailed Annotation Distance to TSS Nearest PromoterID  Entrez ID   Nearest Unigene Nearest Refseq  Nearest Ensembl Gene Name
chr9-1263   chr9    108945410   108945434   +   126.8   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-5451   chr9    108945410   108945434   +   102.8   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-1527   chr9    108945410   108945434   +   126.8   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-2841   chr9    108945410   108945434   +   123.5   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-2268   chr9    108945410   108945434   +   125.7   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-1475   chr9    108945410   108945434   +   122.5   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-1476   chr9    108945410   108945434   +   121.4   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-3571   chr9    108945410   108945434   +   118.1   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-7461   chr9    108945410   108945434   +   83.1    0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-2842   chr9    108945410   108945434   +   120.3   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-1370   chr9    108945410   108945434   +   123.5   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-4379   chr9    108945410   108945434   +   112.6   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-1325   chr9    108945410   108945434   +   126.8   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-1369   chr9    108945410   108945434   +   120.3   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-1791   chr9    108945410   108945434   +   125.7   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr9-6481   chr9    108945410   108945434   +   101.7   0.988   exon (NM_025407, exon 6 of 13)  exon (NM_025407, exon 6 of 13)  -8164   NM_007738   12836   Mm.6200 NM_007738   ENSMUSG00000025650  Col7a1
chr5-1382   chr5    125387930   125387954   +   100.6   0.706   exon (NM_019639, exon 2 of 2)   exon (NM_019639, exon 2 of 2)   2075    NM_019639   22190   Mm.331  NM_019639   ENSMUSG00000008348  Ubc
chr5-1298   chr5    125387930   125387954   +   107.1   0.734   exon (NM_019639, exon 2 of 2)   exon (NM_019639, exon 2 of 2)   2075    NM_019639   22190   Mm.331  NM_019639   ENSMUSG00000008348  Ubc
chr5-1225   chr5    125387930   125387954   +   103.9   0.715   exon (NM_019639, exon 2 of 2)   exon (NM_019639, exon 2 of 2)   2075    NM_019639   22190   Mm.331  NM_019639   ENSMUSG00000008348  Ubc
chr5-1226   chr5    125387930   125387954   +   107.1   0.718   exon (NM_019639, exon 2 of 2)   exon (NM_019639, exon 2 of 2)   2075    NM_019639   22190   Mm.331  NM_019639   ENSMUSG00000008348  Ubc
chr5-39105  chr5    121286308   121286332   +   103.9   0.988   exon (NM_181421, exon 12 of 76) exon (NM_181421, exon 12 of 76) 66101   NM_181421   269700  Mm.184589   NM_181421   ENSMUSG00000042744  Hectd4
chr3-1127   chr3    88747924    88747948    +   91.8    0.79    exon (NM_018804, exon 4 of 4)   exon (NM_018804, exon 4 of 4)   24663   NM_018804   229521  Mm.379376   NM_018804   ENSMUSG00000068923  Syt11
chr3-1100   chr3    88747924    88747948    +   94  0.792   exon (NM_018804, exon 4 of 4)   exon (NM_018804, exon 4 of 4)   24663   NM_018804   229521  Mm.379376   NM_018804   ENSMUSG00000068923  Syt11

As you can see, some of the peaks are listed multiple times with slightly different statistics. This is problematic because DESeq2 will be performing too many tests which negatively impacts the adjusted p-values.

How do you prevent Homer from listing the same peak twice?

rnaseq peaks homer counts • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 2140 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6