Mapping genome regions to gene symbols in two ways get weird results
1
0
Entering edit mode
8.9 years ago
Na Sed ▴ 310

I have some genome regions which I am gonna map them to gene symbols.

To do that there are two methods which have different result. I have mentioned one example below:

1) Using 'Homo.sapiens' package

library(Homo.sapiens)
geneRanges <- function(db, column="ENTREZID")
{
  g <- genes(db, columns=column)
  col <- mcols(g)[[column]]
  genes <- granges(g)[rep(seq_along(g), elementLengths(col))]
  mcols(genes)[[column]] <- as.character(unlist(col))
  genes
}

gns <- geneRanges(Homo.sapiens, column="SYMBOL")
splitColumnByOverlap <-  function(query, subject, column="ENTREZID", ...)
{
  olaps <- findOverlaps(query, subject, ...)
  f1 <- factor(subjectHits(olaps), levels=seq_len(subjectLength(olaps)))
  splitAsList(mcols(query)[[column]][queryHits(olaps)], f1)
}

library(GenomicRanges)
cnv <- makeGRangesFromDataFrame(data.frame(chr="chr2", start=109993208, end=109993264))
gene.Symbol <- splitColumnByOverlap(gns, cnv, "SYMBOL")

# The obtained gene symbol is 'SH3RF3'.

2) using 'biomaRt' package

library(GenomicRanges)
library(biomaRt)

mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
region <- "2:109993208:109993264"
gene.Symbol <- getBM(attributes = c("hgnc_symbol", "chromosome_name", "start_position","end_position"),
            filters = c("chromosomal_region"), values=region, mart=mart)

The obtained result is:

hgnc_symbol chromosome_name start_position end_position
1   LINC01123               2      109987063    109996140

Anyone knows why the results are different from each other?

Thanks

BiomaRt genome-region Homo.sapiens R • 4.4k views
ADD COMMENT
4
Entering edit mode
8.9 years ago

You're using two different genome versions. Biomart is giving you overlaps on GRCh38. The Homo.Sapiens package is giving you overlaps on GRCh37.

ADD COMMENT
0
Entering edit mode

Oh, I didn't notice to it.

Do you know which of them is suitable for hg19? How can I find that?

ADD REPLY
0
Entering edit mode

hg19 is GRCh37.

ADD REPLY
0
Entering edit mode

How you know Homo.sapiens is GRCh37?

ADD REPLY
0
Entering edit mode

I looked at the genes overlapping the region you specified in GRCh37 and GRCh38. The results from those matched perfectly what you described. It's unfortunate that that R package doesn't mention its genome version.

ADD REPLY
0
Entering edit mode

Thank you so much.

ADD REPLY

Login before adding your answer.

Traffic: 2745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6