Question: How to get just protein_coding genes using biomart in R
1
gravatar for M K
4.6 years ago by
M K500
United States
M K500 wrote:

Dear all,

I would like to have help with getting just protein_coding genes from gene expression file using biomart. What I have is a file of all genes expression for mouse (mm10)  with ensemble gene_names, and I need to get ride from other non-coding and pseudogene.

sequencing rna-seq R • 3.0k views
ADD COMMENTlink modified 4.6 years ago by cyril-cros890 • written 4.6 years ago by M K500
1
gravatar for cyril-cros
4.6 years ago by
cyril-cros890
France
cyril-cros890 wrote:

You can go to Ensembl Biomart, and select the following attributes in the Gene section: Gene type, Transcript type. "protein-coding" is the one you want. Just do something like grep "protein_coding" biomartResults.txt and you should be set.

ADD COMMENTlink modified 6 months ago by RamRS27k • written 4.6 years ago by cyril-cros890

I already have my file with all genes on it, and I want to use R to get the protein_coding genes only from my file.

ADD REPLYlink modified 6 months ago by RamRS27k • written 4.6 years ago by M K500
2

You can really use anything for that. If you want to do it in R:

library(biomaRt)
setwd("~/Downloads/")
shouldImport=TRUE
saveFile="proteinCodingMouseGenes.Rda"
if (!shouldImport || !file.exists(saveFile)){
  print("Querying Biomart for protein coding genes")
  ensembl=useMart("ensembl")
  ensemblMouse = useDataset("mmusculus_gene_ensembl",mart=ensembl)
  mouseProteinCodingGenes = getBM(attributes=c("ensembl_gene_id","external_gene_name","description"), filters='biotype', values=c('protein_coding'), mart=ensemblMouse)
  save(mouseProteinCodingGenes,file=saveFile)
} else {
  print("Loading genes from savefile")
  load(saveFile)
}

The only useful part is the one about ensembl, the rest just saves the result of your Biomart query to a file so it can be loaded again (querying Biomart takes a bit of time). biomaRt is the R library you want, you specify what mart you are using, request with getBM a list of the attributes of all the entries whose attributes in filters match the terms in values. listAttributes() does what it is called.

The rest is just dataframe manipulation.

ADD REPLYlink modified 6 months ago by RamRS27k • written 4.6 years ago by cyril-cros890

I run this your R code and it worked, but we have to change ensembl=useMart("ensembl") with

myMart <- useMart("ENSEMBL_MART_ENSEMBL",dataset="mmusculus_gene_ensembl", host="www.ensembl.org")

because there are some changes in Ensembl proxy. the issue that I have that I couldn't read .Rda file. Is there any way to save this file as text, because what I need to do is using merge function in R to merge this file with my file to get only protein_coding genes in mine.

ADD REPLYlink modified 6 months ago by RamRS27k • written 4.6 years ago by M K500

Thanks. Worked perfectly!

ADD REPLYlink written 19 months ago by SmallChess510
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 771 users visited in the last hour