Entering edit mode
8.9 years ago
Na Sed
▴
310
I have some genome regions which I am gonna map them to gene symbols.
To do that there are two methods which have different result. I have mentioned one example below:
1) Using 'Homo.sapiens' package
library(Homo.sapiens)
geneRanges <- function(db, column="ENTREZID")
{
g <- genes(db, columns=column)
col <- mcols(g)[[column]]
genes <- granges(g)[rep(seq_along(g), elementLengths(col))]
mcols(genes)[[column]] <- as.character(unlist(col))
genes
}
gns <- geneRanges(Homo.sapiens, column="SYMBOL")
splitColumnByOverlap <- function(query, subject, column="ENTREZID", ...)
{
olaps <- findOverlaps(query, subject, ...)
f1 <- factor(subjectHits(olaps), levels=seq_len(subjectLength(olaps)))
splitAsList(mcols(query)[[column]][queryHits(olaps)], f1)
}
library(GenomicRanges)
cnv <- makeGRangesFromDataFrame(data.frame(chr="chr2", start=109993208, end=109993264))
gene.Symbol <- splitColumnByOverlap(gns, cnv, "SYMBOL")
# The obtained gene symbol is 'SH3RF3'.
2) using 'biomaRt' package
library(GenomicRanges)
library(biomaRt)
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
region <- "2:109993208:109993264"
gene.Symbol <- getBM(attributes = c("hgnc_symbol", "chromosome_name", "start_position","end_position"),
filters = c("chromosomal_region"), values=region, mart=mart)
The obtained result is:
hgnc_symbol chromosome_name start_position end_position
1 LINC01123 2 109987063 109996140
Anyone knows why the results are different from each other?
Thanks
Oh, I didn't notice to it.
Do you know which of them is suitable for hg19? How can I find that?
hg19 is GRCh37.
How you know Homo.sapiens is GRCh37?
I looked at the genes overlapping the region you specified in GRCh37 and GRCh38. The results from those matched perfectly what you described. It's unfortunate that that R package doesn't mention its genome version.
Thank you so much.