Looking up Gene IDs in R
• 1.1k views
Given a list of gene names, I need to create a table containing the Ensemble ID, chromosome, start, end of that gene.
## ens_id gene view chr start end
## 1: ENSG00000243485 MIR1302-2HG Gene Expression chr1 29553 30267
## 2: ENSG00000237613 FAM138A Gene Expression chr1 36080 36081
## 3: ENSG00000186092 OR4F5 Gene Expression chr1 65418 69055
What command can I use to look up ensemble IDs and start/end locations of genes?
The biomaRt library is great for this.
genes <- c("MIR1302-2HG", "FAM138A", "OR4F5")
ensembl <- useEnsembl("genes", "hsapiens_gene_ensembl")
gene_info <- getBM(
attributes=c("ensembl_gene_id", "external_gene_name", "gene_biotype",
"chromosome_name", "start_position", "end_position", "strand"),
ensembl_gene_id external_gene_name gene_biotype chromosome_name
1 ENSG00000243485 MIR1302-2HG lncRNA 1
2 ENSG00000237613 FAM138A lncRNA 1
3 ENSG00000186092 OR4F5 protein_coding 1
start_position end_position strand
1 29554 31109 1
2 34554 36081 -1
3 65419 71585 1
See the documentation for more information.
Login before adding your answer.
Traffic: 1412 users visited in the last hour
Thank you, I was able to get this to work for a short list of genes. For my full list of genes, I get this error
My guess is because some of the gene names might not be correct "external gene name". Is there a way to filter out those genes, or at least do a quality control check to see if a given gene name is recognized by ensemble?
Can you post the full command you're running?
Here is the code I am using- I am able to do it with the first few genes in my list:
But when I paste all my genes I get errors. It might be a syntax error somewhere.
I copy/paste the list of genes from a text file (see below)- is there a better way to load my genes into the genes object?
If you have all of your genes in a text file load them into R programmatically instead of with copy/paste.
Thank you, this worked!