Question: Convert between RefSeq and Ensembl Transcript?
1
gravatar for pwg46
6.0 years ago by
pwg46430
United States
pwg46430 wrote:

Hello, I am looking for a good data file to convert between refseq (specifically, accession identifiers such as NM_...) and ensembl transcript (ENST...). I am looking for a file which can be downloaded easily via a simple script usng ftp. Also, I would prefer it to be some form of text file with a decent format, as I would essentially be parsing the enst and NM_ parts and insertng them into a mysql table.

ensembl convert refseq enst nm_ • 21k views
ADD COMMENTlink modified 4.6 years ago by ashbigdeli30 • written 6.0 years ago by pwg46430

Are you looking to store ENS and NM ids for the same sequence? Or do you wanna store GenBank and ENSEMBL entries?

ADD REPLYlink written 6.0 years ago by RamRS27k

Hmm, I guess the latter. I've been looking through the GenBank data files, and they have large data files for each chromosome. These files do have ENST -> NM_ mappings for every trancsript on each chromosome, however I feel like using these data files would not be efficient. Not only are they large and take a fairly long time to download, but also parser scripts would take quite a while even though I simply want to create a tab-delimited txt file where the ENST id would be the first tab, and its corresponding NM_ id would be in the second tab.
 

ADD REPLYlink written 6.0 years ago by pwg46430

I'd suggest tinkering with UCSC Genome Browser's mysql database. You should be able to write a query/script that, given ID1, does a bunch of SELECTs for ID2.

ADD REPLYlink written 6.0 years ago by RamRS27k
10
gravatar for Bert Overduin
6.0 years ago by
Bert Overduin3.7k
Edinburgh Genomics, The University of Edinburgh
Bert Overduin3.7k wrote:
mysql -u anonymous -h ensembldb.ensembl.org

mysql> use homo_sapiens_core_75_37;

mysql> SELECT transcript.stable_id, xref.display_label FROM transcript, object_xref, xref,external_db WHERE transcript.transcript_id = object_xref.ensembl_id AND object_xref.ensembl_object_type = 'Transcript' AND object_xref.xref_id = xref.xref_id AND xref.external_db_id = external_db.external_db_id AND external_db.db_name = 'RefSeq_mRNA';

+-----------------+----------------+
| stable_id       | display_label  |
+-----------------+----------------+
| ENST00000517143 | NR_046932.1    |
| ENST00000362897 | NR_046944.1    |
| ENST00000384568 | NR_046948.1    |
| ENST00000384769 | NR_002574.1    |
| ENST00000384323 | NR_002575.1    |
| ENST00000516690 | NR_046934.1    |
| ENST00000363970 | NR_046947.1    |
| ENST00000365367 | NR_046928.1    |
| ENST00000517119 | NR_046935.1    |
| ENST00000427390 | NM_001145004.1 |
| ENST00000384474 | NR_046940.1    |
| ENST00000454856 | NM_001277303.1 |
| ENST00000516986 | NR_046930.1    |
| ENST00000559471 | NM_001193489.1 |
| ENST00000261847 | NM_014701.3    |
| ENST00000439682 | NM_001277304.1 |
| ENST00000439682 | NM_207355.2    |
| ENST00000411348 | NR_046931.1    |
| ENST00000298232 | NM_199259.2    |
| ENST00000361285 | NM_199261.2    |
| ENST00000342420 | NM_199260.2    |
| ENST00000569541 | NM_031421.2    |
| ENST00000299443 | NM_174981.3    |
| ENST00000399848 | NM_181482.4    |
| ENST00000359446 | NM_181481.4    |
ADD COMMENTlink modified 6 months ago by RamRS27k • written 6.0 years ago by Bert Overduin3.7k
3

Or if you want GenBank (EMBL) accession numbers instead:

mysql> SELECT transcript.stable_id, xref.display_label 
>       FROM translation, transcript, object_xref, xref,external_db 
>       WHERE transcript.transcript_id = translation.transcript_id 
>       AND translation.translation_id = object_xref.ensembl_id 
>       AND object_xref.ensembl_object_type = 'Translation' 
>       AND object_xref.xref_id = xref.xref_id 
>       AND xref.external_db_id = external_db.external_db_id 
>       AND external_db.db_name = 'EMBL';
ADD REPLYlink modified 6 months ago by RamRS27k • written 6.0 years ago by Bert Overduin3.7k
6
gravatar for dariober
6.0 years ago by
dariober11k
WCIP | Glasgow | UK
dariober11k wrote:

This is to query ensembl/biomart programmatically via the R library biomaRt:

library("biomaRt")
ensembl<-  useMart("ensembl", dataset="hsapiens_gene_ensembl")

values<- c("NM_001101", "NM_001256799", "NM_000594")

getBM(attributes=c("refseq_mrna", "ensembl_gene_id", "hgnc_symbol"), filters = "refseq_mrna", values = values, mart= ensembl)

Results:

   refseq_mrna ensembl_gene_id hgnc_symbol
1    NM_000594 ENSG00000232810         TNF
2    NM_001101 ENSG00000075624        ACTB
3 NM_001256799 ENSG00000111640       GAPDH

The output of getBM() can be written to a file in tabular format using write.table().

To know which datasets are in biomart and what attributes and filters they have:

listDatasets(useMart("ensembl"))
listFilters(ensembl)
listAttributes(ensembl)

I would also look into the various databases maintained in Bioconductor like org.Hs.eg.db

ADD COMMENTlink modified 9 months ago by RamRS27k • written 6.0 years ago by dariober11k
3
gravatar for ashbigdeli
4.6 years ago by
ashbigdeli30
United States
ashbigdeli30 wrote:

I found this from another post, but I'll echo it here. For all ID conversions I have found this tool to be so so useful and its updated regularly.

http://biodbnet.abcc.ncifcrf.gov/db/db2db.php

ADD COMMENTlink modified 6 months ago by RamRS27k • written 4.6 years ago by ashbigdeli30
2
gravatar for David Westergaard
6.0 years ago by
Copenhagen, Denmark
David Westergaard1.4k wrote:

Why not just use http://www.ensembl.org/biomart/? e.g. http://www.ensembl.org/biomart/martview/49b845b7dcb16ac15fcd8cd7d1461c6c gives you all RefSeq IDs and Ensembl Transcript IDs in a tab seperatet format. You can get the Perl code hitting the "Perl" button.

ADD COMMENTlink written 6.0 years ago by David Westergaard1.4k

I prefer Perl to R, but the Perl Biomart API is pretty rough. I wouldn't rely on it for more than very simple bulk downloads. Also, depending on the species, some biotypes (e.g. mRNA) do not have NCBI RefSeq annotations available to them.

ADD REPLYlink written 6.0 years ago by pld4.8k

That might be true. Honestly, I have never used the Biomart API myself, as I prefer Python to Perl, I just always go for the XML and just submit a query to Biomart. Or simply download the full translation. There is no need to automate what only needs to be done once or twice. An example of automated queries can be found at Automating Database Searches.

ADD REPLYlink written 6.0 years ago by David Westergaard1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1598 users visited in the last hour