Question: Biomart Bioconductor - Retrieving All Entrezgenes Of Hsapiens_Gene_Ensembl
4
gravatar for sthait
6.8 years ago by
sthait70
sthait70 wrote:

Hello,

I'm working with biomaRt package in R. I'm trying to retreive all entrez genes of hsapiensgeneensembl data set. filtering by gene type - protein coding attributes - entrez gene ID

so far I did the following:

library(biomaRt)
human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")

I'm not sure how to do it using getBM function so that it will not be specific to a list of values but to all values in human data set.

thanks for your help,

Tom :)

R biomart bioconductor • 18k views
ADD COMMENTlink modified 21 months ago by macmath130 • written 6.8 years ago by sthait70
7
gravatar for Vikas Bansal
6.8 years ago by
Vikas Bansal2.3k
Berlin, Germany
Vikas Bansal2.3k wrote:

Hi,

I have not used "biomart" from last 2-3 months. But here is something which I was using to play around-

listMarts()    # to see which database options are present

ensembl=useMart("ensembl")  # using ensembl database data

listDatasets(ensembl)     # function to see which datasets are present in ensembl

ensembl=useDataset("hsapiens_gene_ensembl",mart=ensembl)   # from ensembl using homosapien gene data

listFilters(ensembl)  # check which filters are available

listAttributes(ensembl) # check attributes are available to select.More information on ensembl data base

genes.with.id=getBM(attributes=c("ensembl_gene_id", "external_gene_id"),values=gene_names, mart= ensembl) # fuction to get  gene id's and gene name from data base
ADD COMMENTlink written 6.8 years ago by Vikas Bansal2.3k

Thanks!

let me be more specific - my goal is to download all FASTA sequences under the following conditions:

dataSet - hsapiensgeneensembl filter - gene type - protein coding attributes : ensembl gene id, ensembl transcript id, associated gene name, chromosome name, strand, transcript start.

under sequences: 5' UTR, 3000 bp upstream flank

in ensembl->biomart I got 21976/57945 matches and downloaded it a gz fasta file.

I wish to do this in biomaRt bioconductor in R.

I tried to do it with getSequence function but I dont know how to retrieve all sequences in hsapiens.

Thanks a lot,

tom

ADD REPLYlink written 6.7 years ago by sthait70
3
gravatar for Stephane Plaisance
4.9 years ago by
Leuven area (Belgium)
Stephane Plaisance380 wrote:

use a star in the values field (I need only entrezID but you can add more here)

all.entrezgene <- unique( getBM(attributes = "entrezgene",
                    values = "*", 
                    mart = ensembl) )

ADD COMMENTlink written 4.9 years ago by Stephane Plaisance380

Did not work for me; see my answer.

ADD REPLYlink written 2.5 years ago by t.kuilman750
1
gravatar for t.kuilman
2.5 years ago by
t.kuilman750
Netherlands
t.kuilman750 wrote:

I tried the method suggested by Stephane, but this did not work for me:

> library("biomaRt")
> ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
> mapping <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"),
                   filters = "ensembl_gene_id" , values = list("*"), mart = ensembl)
> head(mapping)
[1] ensembl_gene_id hgnc_symbol    
<0 rows> (or 0-length row.names)

However, leaving out the filters and values did the trick for me:

> library("biomaRt")
> ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
> mapping <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), mart = ensembl)
> head(mapping)
  ensembl_gene_id hgnc_symbol
1 ENSG00000252303   RNU6-280P
2 ENSG00000281771            
3 ENSG00000281256            
4 ENSG00000283272            
5 ENSG00000280864            
6 ENSG00000280792            
> dim(mapping)
[1] 63325     2
ADD COMMENTlink modified 8 months ago • written 2.5 years ago by t.kuilman750
0
gravatar for macmath
21 months ago by
macmath130
France
macmath130 wrote:

The below lines provide also entrezgene id

require(biomaRt)
mart = useEnsembl("ENSEMBL_MART_ENSEMBL")
mart=useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
bmIDs = getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id',
                           'description',
                           'chromosome_name',
                           'start_position',
                           'end_position',
                           'strand','mgi_symbol','entrezgene'),mart = mart)
ADD COMMENTlink modified 21 months ago • written 21 months ago by macmath130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1789 users visited in the last hour