ChIP seq peak annotation
1
0
Entering edit mode
12 weeks ago
tsomakiank ▴ 40

Hello! I am trying to annotate H3K9me2 called peaks(macs2) to genes. I am using the homer's annotatePeaks.pl script for the annotation and from the resulting tsv file I am filtering for promoters and genes as the scrip tries to annotate the peaks to all the genomic features of the input gtf file. My problem is that after the filtering I get more than 18.000 annotated genes(almost the whole mouse genome) and that seems to be wrong. Is it something I am missing?

Extra information:

Peak calling is done with macs2(0.05 cutoff)

I only have one replicate and I use idr analysis for self consistency(reject peaks with idr >0.05)

Macs2 at the strand column outputs a dot (.) whereas annotatePeaks.pl at the help page needs a + or - strand information.

Homer ChIP-seq annotatePeaks.pl • 492 views
1
Entering edit mode
12 weeks ago

Hard to tell from the sparse information that you provide, but since H3K9me2 is linked to gene repression, I would not per se reject that the majority of genes in a cell type are inactive at any given time. Also mind that differentiated cells harbour large repressive domains called LOCKS that might get annotated with numerous features at once.

Have you run some exploratory analysis, how many peaks you get and how big they are? In particular, if you have no replicates and only a treatment vs. input call, you might already have a lot of noise in your peak calls. You might also want to try SICER instead for peak calling, which is better suited for histone ChIP-seq.

PS: There is no strand specificity in ChIP-seq. Some tools output a . , others just put + as default, but the meaning is the same. So you can just replace the dots with plus signs if needed by the downstream tool.

1
Entering edit mode

Thanks a lot Mathias. Your answer is very informative to me since a didnt know neither sicer nor the strand output. Thanks a lot!

0
Entering edit mode

You are very welcome.