No genes' ID in peak annotations during ChiP-Seq with ChIPpeakAnno R package
1
0
Entering edit mode
8.6 years ago

I am trying to perform Chip-Seq Analysis which is greatly described in this conversation. http://biology.stackexchange.com/questions/16701/introduction-to-chip-seq. The same as author of mentioned question I am a student of Applied Mathematics and I am doing my best to enter bioinformatics fields.

I am basically at the moment where I would like to annotate peaks with genes' IDs. To do this I am using ChIPpeakAnno R package from Bioconductor. I used below code to annotate my reads (outputs from MACS) with genes' ID's

library(ChIPpeakAnno)
# data(package = "ChIPpeakAnno")$results[,3]
macsOutput <- toGRanges(data="example_peaks.bed",
                        format = "MACS")

data(TSS.human.GRCh37)
macs.anno <- annotatePeakInBatch(macsOutput,
                                 AnnotationData=TSS.human.GRCh37,
                                 output="overlapping", maxgap=5000L)
library(org.Hs.eg.db)
macs.anno <- addGeneIDs(annotatedPeak=macs.anno,
                        orgAnn="org.Hs.eg.db",
                        IDs2Add="symbol")

# no annotations for some genes
as.character(head(as.data.frame(macs.anno)$symbol))
[1] "PTCHD2" "PTCHD2" "PTGER3" NA       "HFM1"   NA

but there occurs that there are no genes' annotations for some peaks. Can anyone tell me why this might happen? And how to avoid this? Does this refer to the maxgap=5000L parameter? When creating output from MACS I set a parameter for length to be 10,000.

ChIP-Seq annotation • 2.5k views
ADD COMMENT
0
Entering edit mode
3.1 years ago
hukai916 ▴ 10

According to the documentation, this command below (by setting the "overlapping" and "5000L") only outputs overlapped features that are within the maximum gap (specified by the maxgap argument) between the peak range and the feature range. It is possible that some peak ranges are far away (>5000L) from any features (e.g. genes in this case), thereby, no annotation will be outputted for them.

annotatePeakInBatch(macsOutput, AnnotationData=TSS.human.GRCh37, output="overlapping", maxgap=5000L)

To avoid this, you can either set a larger value for maxgap, or set the output argument to use "nearestLocation". To make it complete, the "gap" between two neighboring ranges refers to the number of positions that separate them. By convention, the gap between 2 immediate adjacent ranges is 0. And when either the start or end of one range is strictly inside the other range (i.e. non-disjoint ranges), the gap is defined to be -1.

For more Bioconductor-related questions, please refer to its support site: https://support.bioconductor.org. The site is actively monitored and answered by R package developers. You can search for the tag "ChIPpeakAnno" to view all previously answered questions there.

ADD COMMENT

Login before adding your answer.

Traffic: 1824 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6