Question: Missing gene symbols in biomart
0
gravatar for oganm
3.1 years ago by
oganm60
Canada
oganm60 wrote:

I have a bunch of swiss prot IDs that I want to convert to hgnc symbols. To do that I am using biomart and for most genes, it works. But for a small minority, it cannot find a corresponding symbol for a given ID even though web interface of biomart successfully handles the conversion. I added an example for a single gene below.

 

 humanMart = useMart("ensembl", dataset="hsapiens_gene_ensembl")
    humanTrans = getBM(attributes = c('uniprot_swissprot','hgnc_symbol','ensembl_gene_id'),
                       # just take the human ones. just in case...
                       filters = 'uniprot_swissprot', 
                       values = 'Q9Y2R4',
                       mart = humanMart)


> humanTrans[humanTrans$uniprot_swissprot %in% 'Q9Y2R4',]
      uniprot_swissprot hgnc_symbol ensembl_gene_id
15191            Q9Y2R4             ENSG00000277594

 

biomart R • 1.3k views
ADD COMMENTlink modified 3.1 years ago by cdsouthan1.8k • written 3.1 years ago by oganm60
2
gravatar for Neilfws
3.1 years ago by
Neilfws47k
Sydney, Australia
Neilfws47k wrote:

R/biomaRt connects to the exact same data source as the Ensembl web interface and should yield equivalent results if used correctly.

The example UniProt accession that you give does not map to a HGNC symbol using the web interface (may need to click "Results" to see this result).

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Neilfws47k

Thanks. In that case do you know what is the data for biomart's ID converter web interface is coming from?

ADD REPLYlink written 3.1 years ago by oganm60

From Ensembl.

ADD REPLYlink written 3.1 years ago by Bert Overduin3.6k
0
gravatar for cdsouthan
3.1 years ago by
cdsouthan1.8k
cdsouthan1.8k wrote:

You small minority should be the difference between these results

http://www.uniprot.org/uniprot/?query=organism:%22homo%20sapiens%22&fil=reviewed%3Ayes&sort=score   = 20,196

http://www.uniprot.org/uniprot/?query=organism%3A%22homo+sapiens%22+AND+reviewed%3Ayes+AND+database%3A%28type%3Ahgnc%29&sort=score   = 19,817

but it you add in the Ensembl mappings (from the UniProt side) the intersect drops some more

http://www.uniprot.org/uniprot/?query=organism%3A%22homo+sapiens%22+AND+database%3A%28type%3Ahgnc%29+AND+reviewed%3Ayes+AND+database%3A%28type%3Aensembl*%29&sort=score  = 18,693

Thare a number of reasons for the individual missmatches

ADD COMMENTlink written 3.1 years ago by cdsouthan1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 957 users visited in the last hour