R Package For Annotations Of Genomic Regions
3
3
Entering edit mode
10.6 years ago
pmuench ▴ 140

Hi, is there a package for R (or something else) for easy annotation of genomic regions (UTR, Intron, Exon,..) to a given genome location e.g:

    chr   start      end
chr1  100001764  100001784
chr1  100007129  100007148
chr1  100010617  100010637
chr2  10031668   10031688

annotation bioconductor • 12k views
4
Entering edit mode
10.6 years ago

Try the locateVariants function in the VariantAnnotation package

3
Entering edit mode
10.6 years ago
seidel 9.8k

There's a package called ChIPpeakAnno that is designed to annotate genomic regions. The name is perhaps unfortunate as it was developed for annotating genomic segments resulting from chromatin IP analysis, but it can be used to find the nearest or overlapping feature for any set of regions. Given a set of genomic segments and a set of annotations (e.g. from biomart), you can do analysis. More generally, you can compare any two sets of feature descriptions.

http://www.bioconductor.org/packages/2.9/bioc/html/ChIPpeakAnno.html

0
Entering edit mode

Thank you! How can I convert my dataset to use the geneAnnotation function? I think I can use IRanges and RangedData to create a suitable dataformat for the annotatePeakInBatch() function. But how I must decode the chromosome?

1
Entering edit mode

ChIPpeakAnno is not correctly annotating genomic interval as described in the ChIPseeker package because it does not include strand information correctly. ChIPseeker is a very good alternative.

3
Entering edit mode
10.6 years ago
Martin Morgan ★ 1.6k

GenomicFeatures provides functions like exonsBy, transcriptsBy, fiveUTRsByTranscript for extracting information from known gene models, either from packages based on UCSC tracks (e.g., TxDb.Hsapiens.UCSC.hg19.knownGene; see the Annotation packages) or make-your-own via makeTranscriptDbFrom.... See the vignettes on the GenomicFeatures landing page.

0
Entering edit mode

Thank you for your answer, but there is an error when I run: biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene")

1
Entering edit mode

I wonder what the error is? Are you using a current version of R?

0
Entering edit mode

This does not answer the question about how to annotate them. After doing the above I get the following:

> transcriptsBy(txdb, by="gene")
GRangesList object of length 60554:
$ENSG00000000003.14 GRanges object with 5 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <Rle> <IRanges> <Rle> | <integer> <character> [1] chrX [100627109, 100637104] - | 196851 ENST00000612152.4 [2] chrX [100628670, 100636806] - | 196852 ENST00000373020.8 [3] chrX [100632063, 100637104] - | 196853 ENST00000614008.4 [4] chrX [100632541, 100636689] - | 196854 ENST00000496771.5 [5] chrX [100633442, 100639991] - | 196855 ENST00000494424.1$ENSG00000000005.5
GRanges object with 2 ranges and 2 metadata columns:
seqnames                 ranges strand |  tx_id           tx_name
[1]     chrX [100584802, 100599885]      + | 193833 ENST00000373031.4
[2]     chrX [100593624, 100597531]      + | 193834 ENST00000485971.1

\$ENSG00000000419.12
GRanges object with 6 ranges and 2 metadata columns:
seqnames               ranges strand |  tx_id           tx_name
[1]    chr20 [50934867, 50958550]      - | 184765 ENST00000371588.9
[2]    chr20 [50934867, 50958550]      - | 184766 ENST00000466152.5
[3]    chr20 [50934867, 50958555]      - | 184767 ENST00000371582.8
[4]    chr20 [50934896, 50945861]      - | 184768 ENST00000494752.1
[5]    chr20 [50934945, 50958521]      - | 184769 ENST00000371584.8
[6]    chr20 [50936148, 50958532]      - | 184770 ENST00000413082.1

...
<60551 more elements>
-------
seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths


Could you please explain how to annotate the data in the original post (you can assume it is a GRanges object) with the tx_names from my example?