Hello everybody, i'm working with biomaRt bioconductor package in order to convert a list of RefSeq mRNA identifiers to official gene symbol. My query list is composed of 28890 unique RefSeq IDs and the output from the biomaRt query is of only 27533 RefSeq IDs, so seems that more than 1000 RefSeq transcripts are not mapped to gene symbols. But when i search for these unmapped transcripts on the NCBI website i find that they are associated with a specific gene symbols.
For example if i search the gene symbol of "NM_002071" with the function getBM() i have no output but on the ncbi site i find that this refseq is associated to GNAL gene symbol.
Anyone could explain these discrepancies?
Thank you.
Matteo
I don't agree that BioMart is "EBI-centric"; whilst EMBL-EBI is a partner, it aggregates data from multiple sources. You are correct that in this case, the issue is that the transcript record has been retired due to lack of evidence.
Agreed. I guess my point should have been that sometimes databases are out of sync, and this can cause issues.