Question

No genes' ID in peak annotations during ChiP-Seq with ChIPpeakAnno R package

0

Entering edit mode

9.8 years ago

MarcinKosinski ▴ 10

I am trying to perform Chip-Seq Analysis which is greatly described in this conversation. http://biology.stackexchange.com/questions/16701/introduction-to-chip-seq. The same as author of mentioned question I am a student of Applied Mathematics and I am doing my best to enter bioinformatics fields.

I am basically at the moment where I would like to annotate peaks with genes' IDs. To do this I am using ChIPpeakAnno R package from Bioconductor. I used below code to annotate my reads (outputs from MACS) with genes' ID's

library(ChIPpeakAnno)
# data(package = "ChIPpeakAnno")$results[,3]
macsOutput <- toGRanges(data="example_peaks.bed",
                        format = "MACS")

data(TSS.human.GRCh37)
macs.anno <- annotatePeakInBatch(macsOutput,
                                 AnnotationData=TSS.human.GRCh37,
                                 output="overlapping", maxgap=5000L)
library(org.Hs.eg.db)
macs.anno <- addGeneIDs(annotatedPeak=macs.anno,
                        orgAnn="org.Hs.eg.db",
                        IDs2Add="symbol")

# no annotations for some genes
as.character(head(as.data.frame(macs.anno)$symbol))
[1] "PTCHD2" "PTCHD2" "PTGER3" NA       "HFM1"   NA

but there occurs that there are no genes' annotations for some peaks. Can anyone tell me why this might happen? And how to avoid this? Does this refer to the maxgap=5000L parameter? When creating output from MACS I set a parameter for length to be 10,000.

ChIP-Seq annotation • 2.7k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 9.8 years ago by MarcinKosinski ▴ 10

score 0 · Answer 1 · 2021-03-18

According to the documentation, this command below (by setting the "overlapping" and "5000L") only outputs overlapped features that are within the maximum gap (specified by the maxgap argument) between the peak range and the feature range. It is possible that some peak ranges are far away (>5000L) from any features (e.g. genes in this case), thereby, no annotation will be outputted for them.

annotatePeakInBatch(macsOutput, AnnotationData=TSS.human.GRCh37, output="overlapping", maxgap=5000L)

To avoid this, you can either set a larger value for maxgap, or set the output argument to use "nearestLocation". To make it complete, the "gap" between two neighboring ranges refers to the number of positions that separate them. By convention, the gap between 2 immediate adjacent ranges is 0. And when either the start or end of one range is strictly inside the other range (i.e. non-disjoint ranges), the gap is defined to be -1.

For more Bioconductor-related questions, please refer to its support site: https://support.bioconductor.org. The site is actively monitored and answered by R package developers. You can search for the tag "ChIPpeakAnno" to view all previously answered questions there.