Biomart doe not convert between RefSeq and Ensembl Transcripts in GRCh37?
1
0
Entering edit mode
22 months ago
Manuel ▴ 40

I have RefSeq trancripts ID and I need Ensembl Transcript ID working with GRCh37.

When working with GRCh38 this can be done with Biomart however I have just found that this is not possible with GRCh37

From this link:

The RefSeq match option in BioMart is from the Matched Annotation from NCBI and EBI (MANE) collaboration between RefSeq and Ensembl. It has only been calculated for the up-to-date gene annotation on GRCh38 so cannot be obtained on GRCh37. You can get mapping from Ensembl to RefSeq transcripts through BioMart as RefSeq mRNA ID (refseq_mrna in R) but this is not a perfect match like the MANE, it is a mapping based on sequence similarity and similar genomic location, and there can be mismatches between them.

I am looking for a easy way to do this I only do this once a year and I am working with a list of only around 100 transcripts so if possible I dont want to connect with the API via mysql , python or R.

Transcripts Ensembl GRCh37 RefSeq • 784 views
ADD COMMENT
3
Entering edit mode
22 months ago

The translation you'll get from biomart will still be the best available despite the warning. Prior to the MANE project starting (which started after the retirement of GRCh37), their was no correspondence between RefSeq annotations of the genome and Ensembl annotations of the genome - they didn't got agree on the exact locations and structures of transcripts - and the only way to convert between IDs was to find sequences that overlapped.

ADD COMMENT
0
Entering edit mode

Right, Thanks. The next step is to call the previous bioinformatician that worked in the clinical lab I am working because he did this

PreferredTranscripts.txt    RefSeq
ENST00000372348 NM_007313.2
ENST00000318560 NM_005157.5
ENST00000372348 NM_007313.2
ENST00000318560 NM_005157.5
ENST00000224784 NM_001613.2
ENST00000224784 NM_001613.2
ENST00000242057 NM_001621.4
ENST00000262648 NM_000216.3
ENST00000519295 NM_003664.4
ENST00000519295 NM_003664.4

The NM was selected to created a bed file an filter our analysis. Then, at the end of the pipeline, the emsembl IDs are used in the annotation process to pick up only variants that are in these transcripts. As you well mentioned, if they are not showing exactly the same coordinates we might lose variants.

ADD REPLY

Login before adding your answer.

Traffic: 1571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6