Question

R Package For Annotations Of Genomic Regions

3

Entering edit mode

12.1 years ago

pmuench ▴ 140

Hi, is there a package for R (or something else) for easy annotation of genomic regions (UTR, Intron, Exon,..) to a given genome location e.g:

    chr   start      end
    chr1  100001764  100001784
    chr1  100007129  100007148
    chr1  100010617  100010637
    chr2  10031668   10031688

annotation bioconductor • 14k views

ADD COMMENT • link updated 22 months ago by Ram 43k • written 12.1 years ago by pmuench ▴ 140

score 4 · Answer 1 · 2012-04-21

4

Entering edit mode

12.1 years ago

Steve Lianoglou 5.2k

Try the locateVariants function in the VariantAnnotation package

ADD COMMENT • link 12.1 years ago by Steve Lianoglou 5.2k

Ram · Answer 2 · 2012-04-21

3

Entering edit mode

12.1 years ago

seidel 11k

There's a package called ChIPpeakAnno that is designed to annotate genomic regions. The name is perhaps unfortunate as it was developed for annotating genomic segments resulting from chromatin IP analysis, but it can be used to find the nearest or overlapping feature for any set of regions. Given a set of genomic segments and a set of annotations (e.g. from biomart), you can do analysis. More generally, you can compare any two sets of feature descriptions.

http://www.bioconductor.org/packages/2.9/bioc/html/ChIPpeakAnno.html

ADD COMMENT • link 12.1 years ago by seidel 11k

0

Entering edit mode

Thank you! How can I convert my dataset to use the geneAnnotation function? I think I can use IRanges and RangedData to create a suitable dataformat for the annotatePeakInBatch() function. But how I must decode the chromosome?

ADD REPLY • link 12.1 years ago by pmuench ▴ 140

1

Entering edit mode

ChIPpeakAnno is not correctly annotating genomic interval as described in the ChIPseeker package because it does not include strand information correctly. ChIPseeker is a very good alternative.

ADD REPLY • link updated 22 months ago by Ram 43k • written 9.1 years ago by Tom ▴ 240

Ram · Answer 3 · 2012-04-22

3

Entering edit mode

12.1 years ago

Martin Morgan ★ 1.6k

GenomicFeatures provides functions like exonsBy, transcriptsBy, fiveUTRsByTranscript for extracting information from known gene models, either from packages based on UCSC tracks (e.g., TxDb.Hsapiens.UCSC.hg19.knownGene; see the Annotation packages) or make-your-own via makeTranscriptDbFrom.... See the vignettes on the GenomicFeatures landing page.

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 12.1 years ago by Martin Morgan ★ 1.6k

0

Entering edit mode

Thank you for your answer, but there is an error when I run: biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene")

ADD REPLY • link 12.1 years ago by pmuench ▴ 140

1

Entering edit mode

I wonder what the error is? Are you using a current version of R?

ADD REPLY • link 12.1 years ago by Martin Morgan ★ 1.6k

0

Entering edit mode

This does not answer the question about how to annotate them. After doing the above I get the following:

> transcriptsBy(txdb, by="gene")
GRangesList object of length 60554:
$ENSG00000000003.14 
GRanges object with 5 ranges and 2 metadata columns:
      seqnames                 ranges strand |     tx_id           tx_name
         <Rle>              <IRanges>  <Rle> | <integer>       <character>
  [1]     chrX [100627109, 100637104]      - |    196851 ENST00000612152.4
  [2]     chrX [100628670, 100636806]      - |    196852 ENST00000373020.8
  [3]     chrX [100632063, 100637104]      - |    196853 ENST00000614008.4
  [4]     chrX [100632541, 100636689]      - |    196854 ENST00000496771.5
  [5]     chrX [100633442, 100639991]      - |    196855 ENST00000494424.1

$ENSG00000000005.5 
GRanges object with 2 ranges and 2 metadata columns:
      seqnames                 ranges strand |  tx_id           tx_name
  [1]     chrX [100584802, 100599885]      + | 193833 ENST00000373031.4
  [2]     chrX [100593624, 100597531]      + | 193834 ENST00000485971.1

$ENSG00000000419.12 
GRanges object with 6 ranges and 2 metadata columns:
      seqnames               ranges strand |  tx_id           tx_name
  [1]    chr20 [50934867, 50958550]      - | 184765 ENST00000371588.9
  [2]    chr20 [50934867, 50958550]      - | 184766 ENST00000466152.5
  [3]    chr20 [50934867, 50958555]      - | 184767 ENST00000371582.8
  [4]    chr20 [50934896, 50945861]      - | 184768 ENST00000494752.1
  [5]    chr20 [50934945, 50958521]      - | 184769 ENST00000371584.8
  [6]    chr20 [50936148, 50958532]      - | 184770 ENST00000413082.1

...
<60551 more elements>
-------
seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths

Could you please explain how to annotate the data in the original post (you can assume it is a GRanges object) with the tx_names from my example?

ADD REPLY • link 8.0 years ago by endrebak ▴ 960