biomart gene coordinates do not correspond to genome browser
1
4
Entering edit mode
9.1 years ago
tonja.r ▴ 600

I was annotating my dataset with biomart with filtering by chromosomal region and was surprised by the genes I got, so I took a closer look on PRAMENP (ENSG00000197549).

According to biomart its positions are:

      chromosome_name     start_position     end_position     strand     ensembl_gene_id     hgnc_symbol
1     22                  21991099           22043934         -1         ENSG00000197549     PRAMENP

But if I look at genome browser I get following:

GENCODE Transcript Annotation ENST00000337471.4 (PRAMENP)

               Transcript                  Gene
Gencode id     ENST00000337471.4           ENSG00000197549.5
HAVANA
manual id      OTTHUMT00000320276.2        OTTHUMG00000150836.3
Position       chr22:22345497-22398332     chr22:22345497-22398332

Because of those differences while using biomart I get lots of genes that are far away from my dataset (SNPs) according to genome browser. And those that are really close to them (according to genome browser) do not appear in biomart.

ensembl = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="www.ensembl.org",
                  path="/biomart/martservice", dataset="hsapiens_gene_ensembl")
filterlist = list("22:21815836:22006492")
attributes.1 = c("chromosome_name","start_position", "end_position","strand", "ensembl_gene_id", "hgnc_symbol")
results.1 = getBM(attributes = attributes.1, filters = c("chromosomal_region"), values = filterlist, mart = ensembl)> unique(results.1$hgnc_symbol)
[1] "PRAMENP" "MAPK1"   ""        "TOP3B"   "PPM1F"

But according to genome browser (coordinates: chr22:21,815,836-22,006,492) I should have got UBE2L3,YDJC, PI4KAP2 and some more but not those identified by biomart.

I guess the biomart dataset is build on hg38, and I am viewing hg19 in genome browser. Is it possible to get hsapiens_gene_ensembl in hg19?

ensembl R biomart • 13k views
ADD COMMENT
0
Entering edit mode

Switch genome browser to the older build and see if retrieved sequences are the same, that might confirm your suspicion.

ADD REPLY
0
Entering edit mode

I did it already, the result is the biomart dataset is build on hg38 and genome browser is on hg19, but all my data in on hg19, so I want to be consistent. Is there biomart dataset on hg19?

ADD REPLY
13
Entering edit mode
9.1 years ago
komal.rathi ★ 4.1k

You can access Ensembl75 (hg19/GRCh37) using:

grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")

or

ensembl_75 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="feb2014.archive.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")
ADD COMMENT

Login before adding your answer.

Traffic: 1528 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6