Question

Peak annotation using ChIPseeker (bioconductor package)

0

Entering edit mode

9.0 years ago

s.singh ▴ 70

I want to do peak annotation of C.elegans ChIP data using ChIPseeker.

I have used this command peakAnno <- annotatePeak(files, tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="org.Ce.eg.db")

I'm using 'org.Ce.eg.db' from BioConductor for this.

I get the following error:

Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'ENSEMBL'. Please use the keys method to see a listing of valid arguments.

I would really appreciate if someone could help me resolve this error.

ChIP-Seq • 7.1k views

ADD COMMENT • link 9.0 years ago by s.singh ▴ 70

0

Entering edit mode

What are you TxDb are you using?

ADD REPLY • link 9.0 years ago by Sinji ★ 3.2k

0

Entering edit mode

I'm using TxDb.Celegans.UCSC.ce6.ensGene

ADD REPLY • link 9.0 years ago by s.singh ▴ 70

0

Entering edit mode

Hmm, sounds like you're using the right datasets. Mind copy and pasting your code so I can take a look at it? Also put 'chipseeker' as a tag so that the author can get a notification about this question. He's pretty active around here.

ADD REPLY • link 9.0 years ago by Sinji ★ 3.2k

0

Entering edit mode

require(ChIPseeker)

require(TxDb.Celegans.UCSC.ce6.ensGene)

txdb <-TxDb.Celegans.UCSC.ce6.ensGene

require(clusterProfiler)

files = "/home/swadha/Desktop/SFSU/c.elegans_peaks.bed"

peak <- readPeakFile(files)

covplot(peak, weightCol="V5")

promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)

tagMatrix <- getTagMatrix(peak, windows=promoter)

tagHeatmap(tagMatrix, xlim=c(-3000, 3000), color="red")

peakHeatmap(files, TxDb=txdb, upstream=3000, downstream=3000, color="red")

plotAvgProf(tagMatrix, xlim=c(-3000, 3000), xlab="Genomic Region (5'->3')", ylab = "Read Count Frequency")

peakAnno <- annotatePeak(files, tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="org.Ce.eg.db")

loading peak file... 2016-07-28 07:51:40 PM preparing features information... 2016-07-28 07:51:41 PM identifying nearest features... 2016-07-28 07:51:41 PM calculating distance from peak to TSS... 2016-07-28 07:51:42 PM assigning genomic annotation... 2016-07-28 07:51:42 PM adding gene annotation... 2016-07-28 07:51:43 PM

Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'ENSEMBL'. Please use the keys method to see a listing of valid arguments.

How to tag someone on biostars??

ADD REPLY • link 9.0 years ago by s.singh ▴ 70

1

Entering edit mode

Try:

peaks <- readPeakFile("/home/swadha/Desktop/SFSU/c.elegans_peaks.bed", as = "GRanges")
peaksAnno <- annotatePeak(peaks, TxDb=txdb, annoDb="org.Ce.eg.db")

Might be useful to include a few lines of your bed file as well so we can take a look at that.

ADD REPLY • link 9.0 years ago by Sinji ★ 3.2k

0

Entering edit mode

not to tag someone, but the package name.

If you tag your post with chipseeker, I can receive email notification.

Please post a few lines of your bed file as suggested by @Sinji.

ADD REPLY • link 9.0 years ago by Guangchuang Yu ★ 2.6k

score 2 · Answer 1 · 2016-07-29

I download a bed file from GEO and test it. Finally I figure out what's going on here.

This is due to the TxDb.Celegans.UCSC.ce6.ensGene object.

> txdb = TxDb.Celegans.UCSC.ce6.ensGene
> txdb
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: UCSC
# Genome: ce6
# Organism: Caenorhabditis elegans
# Taxonomy ID: 6239
# UCSC Table: ensGene
# Resource URL: http://genome.ucsc.edu/
# Type of Gene ID: Ensembl gene ID
# Full dataset: yes
# miRBase build ID: NA
# transcript_nrow: 35019
# exon_nrow: 146833
# cds_nrow: 127881
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2015-10-07 18:15:25 +0000 (Wed, 07 Oct 2015)
# GenomicFeatures version at creation time: 1.21.30
# RSQLite version at creation time: 1.0.0
# DBSCHEMAVERSION: 1.1
> txidinfo <- transcripts(txdb, columns = c("tx_id", "tx_name", "gene_id"))
> head(txidinfo)
GRanges object with 6 ranges and 3 metadata columns:
      seqnames         ranges strand |     tx_id     tx_name         gene_id
         <Rle>      <IRanges>  <Rle> | <integer> <character> <CharacterList>
  [1]     chrI [11495, 16793]      + |         1  Y74C9A.2.2        Y74C9A.2
  [2]     chrI [11499, 16790]      + |         2  Y74C9A.2.3        Y74C9A.2
  [3]     chrI [11499, 16831]      + |         3  Y74C9A.2.1        Y74C9A.2
  [4]     chrI [11505, 16790]      + |         4  Y74C9A.2.4        Y74C9A.2
  [5]     chrI [11618, 16804]      + |         5  Y74C9A.2.5        Y74C9A.2
  [6]     chrI [43733, 44677]      + |         6    Y74C9A.1        Y74C9A.1
  -------
  seqinfo: 7 sequences (1 circular) from ce6 genome
>

Although the ID type recorded in the txdb metadata is ensemble, Type of Gene ID: Ensembl gene ID (also indicated by the ensGene part of the object name), it actually doesn't have gene ID available in the object.

As indicated in the above code, gene_id is actually transcript name but not ensembl gene ID.

The last step which throw error:

>> adding gene annotation... 2016-07-28 07:51:43 PM

Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'ENSEMBL'. Please use the keys method to see a listing of valid arguments.

is trying to convert ENSEMBL ID toEntrezgene ID and gene Symbol using annoDb="org.Ce.eg.db" which of course will be failed since the ID is not ensemble but tx_name.

This issue actually had been fixed and you are using out-dated version of ChIPseeker.

Please follow my post when you got an issue.

In the latest release version of ChIPseeker, when annotatePeak fail to convert gene ID, it will throw warning and skip this step.

score 0 · Answer 2 · 2016-07-29

0

Entering edit mode

9.0 years ago

s.singh ▴ 70

Hi,

The bed files were generated from the tool macs2 using following command: macs2 callpeak -t HTZ.S36A.Ex39.Br.map.ce10.gz -n HTZ.S36A. -g ce --bdg --keep-dup=auto --broad --broad-cutoff=0.01 --nomodel)

Here is the link of the screenshot of my bed file.

Thanks for replying to my query so actively. I upgraded ChIPseeker (version:1.8.9), but the same error is occurring. How can I overcome this error??

ADD COMMENT • link 9.0 years ago by s.singh ▴ 70

0

Entering edit mode

just use:

peaksAnno <- annotatePeak(peaks, TxDb=txdb)

ADD REPLY • link 9.0 years ago by Guangchuang Yu ★ 2.6k

score 0 · Answer 3 · 2016-07-29

0

Entering edit mode

9.0 years ago

s.singh ▴ 70

That worked perfectly fine!! Thanks a ton Sinji and Guangchuang :)

ADD COMMENT • link 9.0 years ago by s.singh ▴ 70

2

Entering edit mode

> peakAnno <- annotatePeak(peak, tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="org.Ce.eg.db")
>> preparing features information...         2016-08-02 15:20:33
>> identifying nearest features...       2016-08-02 15:20:34
>> calculating distance from peak to TSS...  2016-08-02 15:20:38
>> assigning genomic annotation...       2016-08-02 15:20:38
>> adding gene annotation...             2016-08-02 15:20:45
Loading required package: org.Ce.eg.db

>> assigning chromosome lengths          2016-08-02 15:20:45
>> done...                   2016-08-02 15:20:45
Warning message:
In getGeneAnno(annoDb, peak.gr$geneId, type) :
  ID type not matched, gene annotation will not be added...

In ChIPseeker (version >=1.9.5), it works fine with this case.

ADD REPLY • link 9.0 years ago by Guangchuang Yu ★ 2.6k

0

Entering edit mode

Gene annotation was skipped in this case. How would you solve this problem?

ADD REPLY • link 6.3 years ago by Chenzhen Zhang ▴ 10