Biomart Bioconductor - Retrieving All Entrezgenes Of Hsapiens_Gene_Ensembl
3
5
Entering edit mode
9.4 years ago
sthait ▴ 100

Hello,

I'm working with biomaRt package in R. I'm trying to retreive all entrez genes of hsapiensgeneensembl data set. filtering by gene type - protein coding attributes - entrez gene ID

so far I did the following:

library(biomaRt)
human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")

I'm not sure how to do it using getBM function so that it will not be specific to a list of values but to all values in human data set.

thanks for your help,

Tom :)

biomart r bioconductor • 27k views
ADD COMMENT
8
Entering edit mode

Hi,

I have not used "biomart" from last 2-3 months. But here is something which I was using to play around-

listMarts()    # to see which database options are present
ensembl=useMart("ensembl")  # using ensembl database data
listDatasets(ensembl)     # function to see which datasets are present in ensembl
ensembl=useDataset("hsapiens_gene_ensembl",mart=ensembl)   # from ensembl using homosapien gene data
listFilters(ensembl)  # check which filters are available
listAttributes(ensembl) # check attributes are available to select.More information on ensembl data base
genes.with.id=getBM(attributes=c("ensembl_gene_id", "external_gene_id"),values=gene_names, mart= ensembl) # fuction to get  gene id's and gene name from data base
ADD REPLY
0
Entering edit mode

Thanks!

let me be more specific - my goal is to download all FASTA sequences under the following conditions:

dataSet - hsapiensgeneensembl filter - gene type - protein coding attributes : ensembl gene id, ensembl transcript id, associated gene name, chromosome name, strand, transcript start.

under sequences: 5' UTR, 3000 bp upstream flank

in ensembl->biomart I got 21976/57945 matches and downloaded it a gz fasta file.

I wish to do this in biomaRt bioconductor in R.

I tried to do it with getSequence function but I dont know how to retrieve all sequences in hsapiens.

Thanks a lot,
tom

ADD REPLY
0
Entering edit mode

You just have to play around with the parameters for a while:

Get all genes for current release (GRCh38 on current date, June 16, 2019)

library(biomaRt)
mart <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
genes <- getBM(
  attributes=c("hgnc_symbol","entrezgene_id","chromosome_name","start_position","end_position"),
  mart = mart)
head(genes)
  hgnc_symbol entrezgene chromosome_name start_position end_position
1       MT-TF         NA              MT            577          647
2     MT-RNR1         NA              MT            648         1601
3       MT-TV         NA              MT           1602         1670
4     MT-RNR2         NA              MT           1671         3229
5      MT-TL1         NA              MT           3230         3304
6      MT-ND1       4535              MT           3307         4262

Then, obtain the 5UTR sequencs for genes based on their HGNC symbol:

getSequence(
  id = genes$hgnc_symbol[1000:1005],
  type = "hgnc_symbol",
  seqType = "5utr",
  mart = mart)

If you want bases up- or down-stream of the UTR, you can either try the functionality within getSequence() (see upstream and downstream parameters), OR, you can obtain the 5UTR co-ordinates from the original getBM() function (above), add 3000bp to these, and then use getSequence() without id, like this:

getSequence(chromosome, start, end, ...)
ADD REPLY
3
Entering edit mode

use a star in the values field (I need only entrezID but you can add more here)

all.entrezgene <- unique( getBM(attributes = "entrezgene",
                    values = "*",
                    mart = ensembl) )
ADD REPLY
0
Entering edit mode

Did not work for me; see my answer.

ADD REPLY
1
Entering edit mode
5.1 years ago
thomaskuilman ▴ 820

I tried the method suggested by Stephane, but this did not work for me:

> library("biomaRt")
> ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
> mapping <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"),
                   filters = "ensembl_gene_id" , values = list("*"), mart = ensembl)
> head(mapping)
[1] ensembl_gene_id hgnc_symbol    
<0 rows> (or 0-length row.names)

However, leaving out the filters and values did the trick for me:

> library("biomaRt")
> ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
> mapping <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), mart = ensembl)
> head(mapping)
  ensembl_gene_id hgnc_symbol
1 ENSG00000252303   RNU6-280P
2 ENSG00000281771            
3 ENSG00000281256            
4 ENSG00000283272            
5 ENSG00000280864            
6 ENSG00000280792            
> dim(mapping)
[1] 63325     2
ADD COMMENT
1
Entering edit mode
4.4 years ago
macmath ▴ 160

The below lines provide also entrezgene id

require(biomaRt)
mart = useEnsembl("ENSEMBL_MART_ENSEMBL")
mart=useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
bmIDs = getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id',
                           'description',
                           'chromosome_name',
                           'start_position',
                           'end_position',
                           'strand','mgi_symbol','entrezgene'),mart = mart)
ADD COMMENT
1
Entering edit mode
16 months ago

Nowadays it is like "entrezgene_id"

library(biomaRt)
mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
genes=getBM(attributes = c("hgnc_symbol", "entrezgene_id"),
  filters = "hgnc_symbol", values = all, bmHeader = TRUE, mart = mart)
ADD COMMENT
0
Entering edit mode

Indeed, it has changed to entrezgene_id

ADD REPLY

Login before adding your answer.

Traffic: 1046 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6