Discrepancies Between Biomart Package And Ncbi Database
3
0
Entering edit mode
11.4 years ago

Hello everybody, i'm working with biomaRt bioconductor package in order to convert a list of RefSeq mRNA identifiers to official gene symbol. My query list is composed of 28890 unique RefSeq IDs and the output from the biomaRt query is of only 27533 RefSeq IDs, so seems that more than 1000 RefSeq transcripts are not mapped to gene symbols. But when i search for these unmapped transcripts on the NCBI website i find that they are associated with a specific gene symbols.

For example if i search the gene symbol of "NM_002071" with the function getBM() i have no output but on the ncbi site i find that this refseq is associated to GNAL gene symbol.

Anyone could explain these discrepancies?

Thank you.

Matteo

biomart ncbi conversion • 3.2k views
ADD COMMENT
1
Entering edit mode
11.4 years ago

It looks like this transcript was removed from NCBI due to "insufficient support" or evidence of the transcript. Also, biomart is really EBI-centric, and so NCBI resources may not be a great comparison.

ADD COMMENT
0
Entering edit mode

I don't agree that BioMart is "EBI-centric"; whilst EMBL-EBI is a partner, it aggregates data from multiple sources. You are correct that in this case, the issue is that the transcript record has been retired due to lack of evidence.

ADD REPLY
0
Entering edit mode

Agreed. I guess my point should have been that sometimes databases are out of sync, and this can cause issues.

ADD REPLY
0
Entering edit mode
11.4 years ago

Effectively when I searched for this transcript i only looked at the summary section of the web page where the RefSeq status is given as validated, but looking in the list of transcript for the GNAL gene my RefSeq query is not present and is given as suppressed. It would be useful to have this information listed also in the summary section! I will check if also the other RefSeq IDs not converted by the biomaRt function are obsolete! Thank you for your help,you were very useful!

ADD COMMENT
0
Entering edit mode
11.4 years ago

I've checked if all my unmapped RefSeq IDs are given obsolete by NCBI but that's not true. For example, with biomaRt package, transcript NM_001198531 is not associated with a gene symbol but on NCBI it is given as a reviewed transcript of gene TCF7L2. So i did a manual conversion of this RefSeq ID on the biomart website (ensembl 69), in order to understand if the problem is given by the bioconductor package but again i had no results. Finally i did the same search on biomart website with ensembl version 68 and i got that my RefSeq maps gene TCF7L2, with ensembl transcript ID ENST00000538897 (which is the same results given by the ncbi). Finally i searched for this ensembl transcript id in the current version of ensembl (69) and i found that this transcript still exists and is associated with gene TCF7L2 but if i try to convert this ensembl transcript to refseq i have an empty result.

So my question is: why in version 69 this refseq is no more associated with TCF7L2 gene symbol and ensembl transcript ID ENST00000538897, if this transcript has not changed from version 68 to 69?

thank you!

ADD COMMENT

Login before adding your answer.

Traffic: 3252 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6