Question: ChIPseeker missing full annotation
0
gravatar for rbronste
2.3 years ago by
rbronste330
rbronste330 wrote:

Trying to assign GO and Kegg categories to some ChIP-seq peaks with ChIPseeker and having the following issue:

library("ChIPseeker")

library(TxDb.Mmusculus.UCSC.mm10.knownGene)

library(biomaRt)

library(rtracklayer)

library(org.Mm.eg.db)

peak <- readPeakFile("gains.bed")

peakAnno <- annotatePeak(peak, tssRegion = c(-3000, 3000), TxDb = TxDb.Mmusculus.UCSC.mm10.knownGene, annoDb = "org.Mm.eg.db")

During the annotatePeak step getting this error, not sure what it means exactly:

>> preparing features information...         2017-09-26 10:39:55 
>> identifying nearest features...       2017-09-26 10:39:55 
>> calculating distance from peak to TSS...  2017-09-26 10:39:55 
>> assigning genomic annotation...       2017-09-26 10:39:55 
>> adding gene annotation...             2017-09-26 10:40:08 
'select()' returned 1:many mapping between keys and columns
>> assigning chromosome lengths          2017-09-26 10:40:08 
>> done...                   2017-09-26 10:40:08
go kegg chip-seq chipseeker • 1.3k views
ADD COMMENTlink modified 2.3 years ago by tarek.mohamed260 • written 2.3 years ago by rbronste330
1
gravatar for tarek.mohamed
2.3 years ago by
tarek.mohamed260
tarek.mohamed260 wrote:

Hi, org.Mm.eg.db package uses AnnotationDb package for annotation. AnnotationDb package does annotation via mapIds ( )function which has four main arguments; (1) "keytype" (equivalent to "filter" in Biomart package), (2) "columns" (equivalent to "attributes" in Biomart package), (3) "key" (equivalent to "value" in Biomart package), and "mutival". "multival" arguments specify what should mapIds do when there are multiple values that could be returned? (take a look at ?mapids() ) Options include: first: This value means that when there are multiple matches only the 1st thing that comes back will be returned. This is the default behavior.

So, I guess that in your case, there are several matches between between your gene_id and gene_symbols and/or gene_names, and by default annotatePeak () returns first value that comes back.

A good thing is to visualize your sorted_bed and sorted_bam files or the summit_file.bed against a reference genome build, for this you can use for example golden helix genome browser. Then you can compare what you are seeing with the annotation result from chIPseeker.

N.B. By default annotatePeak () will annotate the peak summits to the nearest gene.

Tarek

ADD COMMENTlink written 2.3 years ago by tarek.mohamed260

I guess my issue is that as input I am using an output from DiffBind which does not entirely match the standard BED or narrowPeak formats. It seems what I need to do is retrieve those parts of the DiffBind input narrowPeak files that I get in the output - to maintain that format precisely.

ADD REPLYlink written 2.3 years ago by rbronste330
1

you can also try to get GRanges object using chippeakanno package, then compare its output with the GRanges object generated by chIPseeker. Eventually, annotatePeak will use the GRanges object for peak annotation.

ADD REPLYlink written 2.3 years ago by tarek.mohamed260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1043 users visited in the last hour