Question: RefSeq Ids(NM*, NR*) to ensemble transcript Ids (ENST*)
1
gravatar for Avi
5.1 years ago by
Avi70
United States
Avi70 wrote:

I have a set of data which has RefSeq Id of a transcript. I am unable to get unique Ensembl transcript Id for the refseq.

I've tried Biomart as suggested by some previous post but first of all they didn't give unique mappings to Ensembl and approx 6K of the refseq id cannot be find in the table.

Also tried UCSC mysql thing, the way they have suggested is, using table mrnaRefseq but when I try to get the table using command 

```

mysql --user=genome -N --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e "select * from mrnaRefSeq" > test1.txt
```

I get 

ERROR 1146 (42S02) at line 1: Table 'hg19.mrnaRefSeq' doesn't exist

 

Any suggestion guys?

Data can be found here. Second column of the data is relevant Refseq Ids

rna-seq • 4.9k views
ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Avi70

This should work with the UCSC table browser:

https://genome.ucsc.edu/cgi-bin/hgTables

Select assembly: hg19, track: Ensembl Genes, output format: selected fields from primary and related tables and then "get output". Under linked tables pick hg19.knownToEnsembl, hg19.knownToRefSeq and hg19.kgXref (using allow selection button at the bottom in-between). Then just check whatever columns you want (gene symbol, ensembl ID, RefSeq ID, etc.) and press "get output". This should create a tab-delimited file with the desired information.

ADD REPLYlink modified 14 months ago by Ram32k • written 5.1 years ago by trausch1.6k
1
gravatar for Amitm
5.1 years ago by
Amitm2.1k
UK
Amitm2.1k wrote:

hi,

Databases keep updating and if you work on large gene sets you would find that a bunch of IDs would mostly fall out of the sieve. Like NCBI RefSeq keep going under manual curation and if you keep checking every other month, some IDs get "suppressed"

I clicked your file link and it seemed that they are something related to Cufflinks output (?). If yes then the best would be to find out what was the GTF used (wherefrom like Ensembl, UCSC etc.). Ideally version matched. Like I have been working on the Ensembl 73 release data. Everytime I need to get something annotated, if I go to BioMart, I would use the archived ver. linked to 73 (btw, you can do that so through BioC pkg BioMart too).

If find the db version for your GTF is not an option then you can check this FTP link and download the gene2ensembl file, select out the taxon for human (9606 I think) and you have mapping of RefSeq to Ensembl. Though I can't gurantee if all your RefSeq's would get mapping but this is the most comprehensive place for NCBI gene anno.

ADD COMMENTlink modified 14 months ago by Ram32k • written 5.1 years ago by Amitm2.1k
0
gravatar for Avi
5.1 years ago by
Avi70
United States
Avi70 wrote:

Thanks @Amitm, Yea I know that's the problem, I don't know which version of GTF was used for creating this.

I tried using the file path you've given too, I am still missing approx 7k RefSeqs.

ADD COMMENTlink written 5.1 years ago by Avi70

hi,

I'm not sure if I know of another option. I am guessing that you have already tried Gene ID converters. Like one on DAVID. Depending on your biological questn., if you think that those IDs are must to be annotated then I think you can try this -

For those missing, from the Cufflinks result file, you should have the genomic coordinates. You must know at least if its hg19 or earlier or latest ver. And then use those coordinates and a latest gene anno file (NCBI, Ensembl, whatever suits you) to find out whom those coordinates overlap to. I think that should solve your issue.

ADD REPLYlink written 5.1 years ago by Amitm2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1011 users visited in the last hour
_