biomart gene coordinates do not correspond to genome browser
1
4
Entering edit mode
6.8 years ago
tonja.r ▴ 500

I was annotating my dataset with biomart with filtering by chromosomal region and was surprised by the genes I got, so I took a closer look on PRAMENP (ENSG00000197549).

According to biomart its positions are:

  chromosome_name start_position end_position strand ensembl_gene_id hgnc_symbol
1 22 21991099 22043934 -1 ENSG00000197549 PRAMENP

 

but if I look at genome browser I get following:

 

GENCODE Transcript Annotation ENST00000337471.4 (PRAMENP)

  Transcript Gene
Gencode id ENST00000337471.4 ENSG00000197549.5
HAVANA manual id OTTHUMT00000320276.2 OTTHUMG00000150836.3
Position chr22:22345497-22398332 chr22:22345497-22398332

 

because of those differences while using biomart I get lots of genes that are far away from my dataset (SNPs) according to genome browser. And those that are really close to them (according to genome browser) do not appear in biomart.
ensembl = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="www.ensembl.org",
                  path="/biomart/martservice", dataset="hsapiens_gene_ensembl")
filterlist = list("22:21815836:22006492")
attributes.1 = c("chromosome_name","start_position", "end_position","strand", "ensembl_gene_id", "hgnc_symbol")
results.1 = getBM(attributes = attributes.1, filters = c("chromosomal_region"), values = filterlist, mart = ensembl)> unique(results.1$hgnc_symbol)
[1] "PRAMENP" "MAPK1"   ""        "TOP3B"   "PPM1F" 

But according to genome browser (coordinates: chr22:21,815,836-22,006,492) I should have got UBE2L3,YDJC, PI4KAP2 and some more but not those identified by biomart.

 

I guess the biomart dataset is build on hg38, and I am viewing hg19 in genome browser. Is it possible to get hsapiens_gene_ensembl in hg19?

R biomart ensembl • 10k views
ADD COMMENT
0
Entering edit mode

Switch genome browser to the older build and see if retrieved sequences are the same, that might confirm your suspicion.

ADD REPLY
0
Entering edit mode

I did it already, the result is the biomart dataset is build on hg38 and genome browser is on hg19, but all my data in on hg19, so I want to be consistent. Is there biomart dataset on hg19?

ADD REPLY
13
Entering edit mode
6.8 years ago
komal.rathi ★ 3.9k

You can access Ensembl75 (hg19/GRCh37) using:

grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")

or 

ensembl_75 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="feb2014.archive.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")
ADD COMMENT

Login before adding your answer.

Traffic: 1750 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6