R Package For Annotations Of Genomic Regions
3
3
Entering edit mode
9.6 years ago
pmuench ▴ 140

Hi, is there a package for R (or something else) for easy annotation of genomic regions (UTR, Intron, Exon,..) to a given genome location e.g:

    chr   start      end
    chr1  100001764  100001784
    chr1  100007129  100007148
    chr1  100010617  100010637
    chr2  10031668   10031688
annotation bioconductor • 11k views
ADD COMMENT
4
Entering edit mode
9.6 years ago

Try the locateVariants function in the VariantAnnotation package

ADD COMMENT
3
Entering edit mode
9.6 years ago
seidel 8.6k

There's a package called ChIPpeakAnno that is designed to annotate genomic regions. The name is perhaps unfortunate as it was developed for annotating genomic segments resulting from chromatin IP analysis, but it can be used to find the nearest or overlapping feature for any set of regions. Given a set of genomic segments and a set of annotations (e.g. from biomart), you can do analysis. More generally, you can compare any two sets of feature descriptions.

http://www.bioconductor.org/packages/2.9/bioc/html/ChIPpeakAnno.html

ADD COMMENT
0
Entering edit mode

Thank you! How can I convert my dataset to use the geneAnnotation function? I think I can use IRanges and RangedData to create a suitable dataformat for the annotatePeakInBatch() function. But how I must decode the chromosome?

ADD REPLY
1
Entering edit mode

ChIPpeakAnno is not correctly annotating genomic interval as described in the ChIPseeker package because it does not include strand information correctly. ChIPseeker is a very good alternative.

ADD REPLY
3
Entering edit mode
9.6 years ago
Martin Morgan ★ 1.6k

GenomicFeatures provides functions like exonsBy, transcriptsBy, fiveUTRsByTranscript for extracting information from known gene models, either from packages based on UCSC tracks (e.g., TxDb.Hsapiens.UCSC.hg19.knownGene; see the Annotation packages) or make-your-own via makeTranscriptDbFrom.... See the vignettes on the GenomicFeatures landing page.

ADD COMMENT
0
Entering edit mode

Thank you for your answer, but there is an error when I run: biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene")

ADD REPLY
1
Entering edit mode

I wonder what the error is? Are you using a current version of R?

ADD REPLY
0
Entering edit mode

This does not answer the question about how to annotate them. After doing the above I get the following:

> transcriptsBy(txdb, by="gene")
GRangesList object of length 60554:
$ENSG00000000003.14 
GRanges object with 5 ranges and 2 metadata columns:
      seqnames                 ranges strand |     tx_id           tx_name
         <Rle>              <IRanges>  <Rle> | <integer>       <character>
  [1]     chrX [100627109, 100637104]      - |    196851 ENST00000612152.4
  [2]     chrX [100628670, 100636806]      - |    196852 ENST00000373020.8
  [3]     chrX [100632063, 100637104]      - |    196853 ENST00000614008.4
  [4]     chrX [100632541, 100636689]      - |    196854 ENST00000496771.5
  [5]     chrX [100633442, 100639991]      - |    196855 ENST00000494424.1

$ENSG00000000005.5 
GRanges object with 2 ranges and 2 metadata columns:
      seqnames                 ranges strand |  tx_id           tx_name
  [1]     chrX [100584802, 100599885]      + | 193833 ENST00000373031.4
  [2]     chrX [100593624, 100597531]      + | 193834 ENST00000485971.1

$ENSG00000000419.12 
GRanges object with 6 ranges and 2 metadata columns:
      seqnames               ranges strand |  tx_id           tx_name
  [1]    chr20 [50934867, 50958550]      - | 184765 ENST00000371588.9
  [2]    chr20 [50934867, 50958550]      - | 184766 ENST00000466152.5
  [3]    chr20 [50934867, 50958555]      - | 184767 ENST00000371582.8
  [4]    chr20 [50934896, 50945861]      - | 184768 ENST00000494752.1
  [5]    chr20 [50934945, 50958521]      - | 184769 ENST00000371584.8
  [6]    chr20 [50936148, 50958532]      - | 184770 ENST00000413082.1

...
<60551 more elements>
-------
seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths

Could you please explain how to annotate the data in the original post (you can assume it is a GRanges object) with the tx_names from my example?

ADD REPLY

Login before adding your answer.

Traffic: 1522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6