Question: Discrepancies Between Biomart Package And Ncbi Database
0
gravatar for Matteo Dugo
7.1 years ago by
Milan - Italy
Matteo Dugo0 wrote:

Hello everybody, i'm working with biomaRt bioconductor package in order to convert a list of RefSeq mRNA identifiers to official gene symbol. My query list is composed of 28890 unique RefSeq IDs and the output from the biomaRt query is of only 27533 RefSeq IDs, so seems that more than 1000 RefSeq transcripts are not mapped to gene symbols. But when i search for these unmapped transcripts on the NCBI website i find that they are associated with a specific gene symbols.

For example if i search the gene symbol of "NM_002071" with the function getBM() i have no output but on the ncbi site i find that this refseq is associated to GNAL gene symbol.

Anyone could explain these discrepancies?

Thank you.

Matteo

ncbi biomart conversion • 2.3k views
ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by Matteo Dugo0
1
gravatar for Matt Shirley
7.1 years ago by
Matt Shirley9.2k
Cambridge, MA
Matt Shirley9.2k wrote:

It looks like this transcript was removed from NCBI due to "insufficient support" or evidence of the transcript. Also, biomart is really EBI-centric, and so NCBI resources may not be a great comparison.

ADD COMMENTlink written 7.1 years ago by Matt Shirley9.2k

I don't agree that BioMart is "EBI-centric"; whilst EMBL-EBI is a partner, it aggregates data from multiple sources. You are correct that in this case, the issue is that the transcript record has been retired due to lack of evidence.

ADD REPLYlink written 7.1 years ago by Neilfws48k

Agreed. I guess my point should have been that sometimes databases are out of sync, and this can cause issues.

ADD REPLYlink written 7.1 years ago by Matt Shirley9.2k
0
gravatar for Matteo Dugo
7.1 years ago by
Milan - Italy
Matteo Dugo0 wrote:

Effectively when I searched for this transcript i only looked at the summary section of the web page where the RefSeq status is given as validated, but looking in the list of transcript for the GNAL gene my RefSeq query is not present and is given as suppressed. It would be useful to have this information listed also in the summary section! I will check if also the other RefSeq IDs not converted by the biomaRt function are obsolete! Thank you for your help,you were very useful!

ADD COMMENTlink written 7.1 years ago by Matteo Dugo0
0
gravatar for Matteo Dugo
7.1 years ago by
Milan - Italy
Matteo Dugo0 wrote:

I've checked if all my unmapped RefSeq IDs are given obsolete by NCBI but that's not true. For example, with biomaRt package, transcript NM_001198531 is not associated with a gene symbol but on NCBI it is given as a reviewed transcript of gene TCF7L2. So i did a manual conversion of this RefSeq ID on the biomart website (ensembl 69), in order to understand if the problem is given by the bioconductor package but again i had no results. Finally i did the same search on biomart website with ensembl version 68 and i got that my RefSeq maps gene TCF7L2, with ensembl transcript ID ENST00000538897 (which is the same results given by the ncbi). Finally i searched for this ensembl transcript id in the current version of ensembl (69) and i found that this transcript still exists and is associated with gene TCF7L2 but if i try to convert this ensembl transcript to refseq i have an empty result.

So my question is: why in version 69 this refseq is no more associated with TCF7L2 gene symbol and ensembl transcript ID ENST00000538897, if this transcript has not changed from version 68 to 69?

thank you!

ADD COMMENTlink written 7.1 years ago by Matteo Dugo0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 780 users visited in the last hour