Question: Map ensembl gene id form hg38 in hg19
0
gravatar for Tanvir Ahamed
2.5 years ago by
Sweden
Tanvir Ahamed 270 wrote:

I want to map all ensembl gene id form hg38 in hg19. Any help will be appreciated ? Thanks !!

Example :

Loading library

library(biomaRt)

List of miRNA form hg38

grch38     <- useMart("ensembl",dataset="hsapiens_gene_ensembl")
miRNA38    <- getBM( attributes=c("ensembl_gene_id","transcript_biotype"),
                     filters=c("transcript_biotype"),values=list("miRNA",TRUE), mart=grch38)

Result : Total 4555 ensembl gene id

List of miRNA form hg19

grch37     <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org",
                      path="/biomart/martservice",dataset="hsapiens_gene_ensembl")
miRNA37    <- getBM(attributes=c("ensembl_gene_id","transcript_biotype"),
                     filters=c("transcript_biotype"),values=list("miRNA",TRUE), mart=grch37)

Result: Total 3411 ensembl gene id

Extraxt hg38/GRCH38 ensembl_gene_id form hg19/GRCH37

en_id_hg38  <- miRNA38$ensembl_gene_id
miRNA38_19  <- getBM( attributes=c("ensembl_gene_id","transcript_biotype"),
                     filters=c("ensembl_gene_id"),values=list(en_id_hg38,TRUE), mart=grch37)

Result: Total 2802 ensembl gene id. But rest of 1753 (4555-2802) ensembl gene id (hg38) are not mapped in hg19.

Now How to map these 1753 hg38 ensembl id in hg19 ?

hg19 hg38 sequencing • 2.0k views
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Tanvir Ahamed 270

Unless there have been changes in the gene structure, the Ensembl gene ID should be the same across releases or assemblies e.g. the Ensembl gene ID for BRCA2 in both GRCh38 and GRCh37 is ENSG00000139618. However, minor changes on the UTR for example will imply a different ID being given. Have you got a gene (or list of genes) and do you know the changes between them in the different assemblies?

ADD REPLYlink written 2.5 years ago by Denise - Open Targets4.7k

Example added to main question !!

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Tanvir Ahamed 270

There was a very useful post:

Converting Genome Coordinates From One Genome Version To Another (Ucsc Liftover, Ncbi Remap, Ensembl Api)

ADD REPLYlink written 2.5 years ago by natasha.sernova3.1k

As far I understand, the OP'd like to convert IDs, not coordinates though.

ADD REPLYlink written 2.5 years ago by Denise - Open Targets4.7k

That's true, my fault.

ADD REPLYlink written 2.5 years ago by natasha.sernova3.1k

There could be two things going on here. Firstly, some of the IDs found in GRCh38 but not in GRCh37 could be simply due to the fact that the loci were not annotated in GRCh37 at all, but rather just in GRCh38. The other possibility is that the loci are in GRCh37 but got a different ENSG ID in GRCh38, in case there was some changes in the models. Perhaps you could get the latest GTF files from the Ensembl FTP sites and compare them (GRCh38 and GRCh37).

ADD REPLYlink written 2.5 years ago by Denise - Open Targets4.7k

Thanks your your time and reply.

I have also tried with GTF files from Ensembl FTP for both GRCh37, GRCh38 and tried to map ENSG ID from GRCh38 in GRCh37 for miRNA (BioType). But could not figured out an active solution. :(

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Tanvir Ahamed 270

Post a few examples of ID's that do not map so @Denise can figure out what is going on.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by genomax59k

Or better still, email the Ensembl helpdesk as they can find out if this is a feature or a bug, if you know what I mean.

ADD REPLYlink written 2.5 years ago by Denise - Open Targets4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1153 users visited in the last hour