Retrieving All Uniprot/Gene Ids From Bioconductor Biomart
2
0
Entering edit mode
10.2 years ago

I know that using ensembl biomart in their webservice or bioperl, one can easily select whole datasets of interest and download them. Reading Biomart documentation for bioconductor, I could not understand if they have such a feature or not. I know getBM function retrieves information of interest for given vector of genes, but it does no have the feature to download the whole dataset.

What I exactly want to do is to be able to obtain a dataframe with uniprot IDs, associated gene names and ensembl gene IDs.

Thanks Mehran

biomart bioconductor retrieval uniprot id • 5.3k views
ADD COMMENT
2
Entering edit mode
10.2 years ago
Emily 23k

You can't get data for all genes in BioMart. Ostensibly this should work, you would just use the genes database without any filters then select your IDs as attributes. In practice, when you do this it would break down partway through the query, without warning, and would just give you a partial dataset, leaving you to puzzle about what's going on. This is because BioMart is a little bit clunky and is not capable of handling very large datasets, such as every gene in the genome.

There are a few solutions to this problem. The best is really to use the Perl API. You can refer to our online course here to learn how.

ADD COMMENT
2
Entering edit mode
10.2 years ago
Irsan ★ 7.8k

You can use org.Hs.eg.db bioconductor annotation package as well. To get all mappings between entrez and uniprot do:

library(org.Hs.eg.db)
x <- org.Hs.egUNIPROT
# Get the entrez gene IDs that are mapped to a Uniprot ID mapped_genes <- mappedkeys(x) # Convert to a list
xx <- as.list(x[mapped_genes])
ADD COMMENT

Login before adding your answer.

Traffic: 1888 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6