ChIPseeker missing full annotation
1
0
Entering edit mode
6.6 years ago
rbronste ▴ 420

Trying to assign GO and Kegg categories to some ChIP-seq peaks with ChIPseeker and having the following issue:

library("ChIPseeker")

library(TxDb.Mmusculus.UCSC.mm10.knownGene)

library(biomaRt)

library(rtracklayer)

library(org.Mm.eg.db)

peak <- readPeakFile("gains.bed")

peakAnno <- annotatePeak(peak, tssRegion = c(-3000, 3000), TxDb = TxDb.Mmusculus.UCSC.mm10.knownGene, annoDb = "org.Mm.eg.db")

During the annotatePeak step getting this error, not sure what it means exactly:

>> preparing features information...         2017-09-26 10:39:55 
>> identifying nearest features...       2017-09-26 10:39:55 
>> calculating distance from peak to TSS...  2017-09-26 10:39:55 
>> assigning genomic annotation...       2017-09-26 10:39:55 
>> adding gene annotation...             2017-09-26 10:40:08 
'select()' returned 1:many mapping between keys and columns
>> assigning chromosome lengths          2017-09-26 10:40:08 
>> done...                   2017-09-26 10:40:08
ChIPSeeker ChIP-Seq KEGG GO • 2.9k views
ADD COMMENT
1
Entering edit mode
6.6 years ago
tarek.mohamed ▴ 360

Hi, org.Mm.eg.db package uses AnnotationDb package for annotation. AnnotationDb package does annotation via mapIds ( )function which has four main arguments; (1) "keytype" (equivalent to "filter" in Biomart package), (2) "columns" (equivalent to "attributes" in Biomart package), (3) "key" (equivalent to "value" in Biomart package), and "mutival". "multival" arguments specify what should mapIds do when there are multiple values that could be returned? (take a look at ?mapids() ) Options include: first: This value means that when there are multiple matches only the 1st thing that comes back will be returned. This is the default behavior.

So, I guess that in your case, there are several matches between between your gene_id and gene_symbols and/or gene_names, and by default annotatePeak () returns first value that comes back.

A good thing is to visualize your sorted_bed and sorted_bam files or the summit_file.bed against a reference genome build, for this you can use for example golden helix genome browser. Then you can compare what you are seeing with the annotation result from chIPseeker.

N.B. By default annotatePeak () will annotate the peak summits to the nearest gene.

Tarek

ADD COMMENT
0
Entering edit mode

I guess my issue is that as input I am using an output from DiffBind which does not entirely match the standard BED or narrowPeak formats. It seems what I need to do is retrieve those parts of the DiffBind input narrowPeak files that I get in the output - to maintain that format precisely.

ADD REPLY
1
Entering edit mode

you can also try to get GRanges object using chippeakanno package, then compare its output with the GRanges object generated by chIPseeker. Eventually, annotatePeak will use the GRanges object for peak annotation.

ADD REPLY

Login before adding your answer.

Traffic: 2526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6