Question: ChIPseeker missing full annotation
0
gravatar for rbronste
17 months ago by
rbronste230
rbronste230 wrote:

Trying to assign GO and Kegg categories to some ChIP-seq peaks with ChIPseeker and having the following issue:

library("ChIPseeker")

library(TxDb.Mmusculus.UCSC.mm10.knownGene)

library(biomaRt)

library(rtracklayer)

library(org.Mm.eg.db)

peak <- readPeakFile("gains.bed")

peakAnno <- annotatePeak(peak, tssRegion = c(-3000, 3000), TxDb = TxDb.Mmusculus.UCSC.mm10.knownGene, annoDb = "org.Mm.eg.db")

During the annotatePeak step getting this error, not sure what it means exactly:

>> preparing features information...         2017-09-26 10:39:55 
>> identifying nearest features...       2017-09-26 10:39:55 
>> calculating distance from peak to TSS...  2017-09-26 10:39:55 
>> assigning genomic annotation...       2017-09-26 10:39:55 
>> adding gene annotation...             2017-09-26 10:40:08 
'select()' returned 1:many mapping between keys and columns
>> assigning chromosome lengths          2017-09-26 10:40:08 
>> done...                   2017-09-26 10:40:08
go kegg chip-seq chipseeker • 905 views
ADD COMMENTlink modified 17 months ago by tarek.mohamed240 • written 17 months ago by rbronste230
1
gravatar for tarek.mohamed
17 months ago by
tarek.mohamed240
tarek.mohamed240 wrote:

Hi, org.Mm.eg.db package uses AnnotationDb package for annotation. AnnotationDb package does annotation via mapIds ( )function which has four main arguments; (1) "keytype" (equivalent to "filter" in Biomart package), (2) "columns" (equivalent to "attributes" in Biomart package), (3) "key" (equivalent to "value" in Biomart package), and "mutival". "multival" arguments specify what should mapIds do when there are multiple values that could be returned? (take a look at ?mapids() ) Options include: first: This value means that when there are multiple matches only the 1st thing that comes back will be returned. This is the default behavior.

So, I guess that in your case, there are several matches between between your gene_id and gene_symbols and/or gene_names, and by default annotatePeak () returns first value that comes back.

A good thing is to visualize your sorted_bed and sorted_bam files or the summit_file.bed against a reference genome build, for this you can use for example golden helix genome browser. Then you can compare what you are seeing with the annotation result from chIPseeker.

N.B. By default annotatePeak () will annotate the peak summits to the nearest gene.

Tarek

ADD COMMENTlink written 17 months ago by tarek.mohamed240

I guess my issue is that as input I am using an output from DiffBind which does not entirely match the standard BED or narrowPeak formats. It seems what I need to do is retrieve those parts of the DiffBind input narrowPeak files that I get in the output - to maintain that format precisely.

ADD REPLYlink written 17 months ago by rbronste230
1

you can also try to get GRanges object using chippeakanno package, then compare its output with the GRanges object generated by chIPseeker. Eventually, annotatePeak will use the GRanges object for peak annotation.

ADD REPLYlink written 17 months ago by tarek.mohamed240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2559 users visited in the last hour