Looking up Gene IDs in R
1
0
Entering edit mode
15 months ago
cthangav ▴ 40

Hello, Given a list of gene names, I need to create a table containing the Ensemble ID, chromosome, start, end of that gene. Example:

##             ens_id        gene            view  chr start   end
## 1: ENSG00000243485 MIR1302-2HG Gene Expression chr1 29553 30267
## 2: ENSG00000237613     FAM138A Gene Expression chr1 36080 36081
## 3: ENSG00000186092       OR4F5 Gene Expression chr1 65418 69055

What command can I use to look up ensemble IDs and start/end locations of genes?

R • 940 views
ADD COMMENT
3
Entering edit mode
15 months ago

The biomaRt library is great for this.

library("biomaRt")

genes <- c("MIR1302-2HG", "FAM138A", "OR4F5")

ensembl <- useEnsembl("genes", "hsapiens_gene_ensembl")

gene_info <- getBM(
  mart=ensembl,
  attributes=c("ensembl_gene_id", "external_gene_name", "gene_biotype",
    "chromosome_name", "start_position", "end_position", "strand"),
  filters=list(external_gene_name=genes))

> gene_info
  ensembl_gene_id external_gene_name   gene_biotype chromosome_name
1 ENSG00000243485        MIR1302-2HG         lncRNA               1
2 ENSG00000237613            FAM138A         lncRNA               1
3 ENSG00000186092              OR4F5 protein_coding               1
  start_position end_position strand
1          29554        31109      1
2          34554        36081     -1
3          65419        71585      1

See the documentation for more information.

ADD COMMENT
0
Entering edit mode

Thank you, I was able to get this to work for a short list of genes. For my full list of genes, I get this error

Error in getBM(mart = ensembl, attributes = c("ensembl_gene_id", "external_gene_name",  : 
  object 'genes' not found

My guess is because some of the gene names might not be correct "external gene name". Is there a way to filter out those genes, or at least do a quality control check to see if a given gene name is recognized by ensemble?

ADD REPLY
1
Entering edit mode

Can you post the full command you're running?

ADD REPLY
0
Entering edit mode

Here is the code I am using- I am able to do it with the first few genes in my list:

> library("biomaRt")
> genes <- c("0610005C13Rik", "0610007P14Rik", "0610009B22Rik", "0610009L18Rik", "0610009O20Rik", "0610010B08Rik", "0610010F05Rik", "0610010K14Rik", "0610011F06Rik", "0610012G03Rik", "0610030E20Rik", "0610031O16Rik")
> ensembl <- useEnsembl("genes", "mmusculus_gene_ensembl")
> gene_info <- getBM(
+   mart=ensembl,
+   attributes=c("ensembl_gene_id", "external_gene_name", "gene_biotype",
+                "chromosome_name", "start_position", "end_position", "strand"),
+   filters=list(external_gene_name=genes))
> gene_info
     ensembl_gene_id external_gene_name   gene_biotype chromosome_name start_position end_position strand
1 ENSMUSG00000042208      0610010F05Rik protein_coding              11       23514961     23583639     -1
2 ENSMUSG00000107002      0610012G03Rik protein_coding              16       31765868     31767312     -1
3 ENSMUSG00000099146      0610031O16Rik         lncRNA               3      137916477    137946166     -1
4 ENSMUSG00000043644      0610009L18Rik         lncRNA              11      120239504    120242016      1
5 ENSMUSG00000007777      0610009B22Rik protein_coding              11       51576213     51579701     -1
6 ENSMUSG00000109644      0610005C13Rik         lncRNA               7       45217218     45224751     -1
7 ENSMUSG00000058706      0610030E20Rik protein_coding               6       72324300     72330131      1
8 ENSMUSG00000020831      0610010K14Rik protein_coding              11       70126032     70128740     -1

But when I paste all my genes I get errors. It might be a syntax error somewhere.

 genes <- c("0610005C13Rik", "0610007P14Rik", "0610009B22Rik", "0610009L18Rik", "0610009O20Rik", "0610010B08Rik", "0610010F05Rik", "0610010K14Rik",...
+ gene_info <- getBM(
+   mart=ensembl,
+   attributes=c("ensembl_gene_id", "external_gene_name", "gene_biotype",
Error: unexpected symbol in:
"  mart=ensembl,
  attributes=c("ensembl_gene_id"
                "chromosome_name", "start_position", "end_position", "strand"),
Error: unexpected ',' in "               "chromosome_name","
filters=list(external_gene_name=genes))
Error: unexpected ')' in "  filters=list(external_gene_name=genes))"

I copy/paste the list of genes from a text file (see below)- is there a better way to load my genes into the genes object? Text File

Thank you

ADD REPLY
1
Entering edit mode

If you have all of your genes in a text file load them into R programmatically instead of with copy/paste.

library("stringr")
library("readr")

gene_file <- "genes.txt"

genes <- gene_file %>%
  read_file %>%
  str_extract_all('(?<=\\")[[:alnum:]\\-_.]+(?=\\")', simplify=TRUE) %>%
  as.character
ADD REPLY
0
Entering edit mode

Thank you, this worked!

ADD REPLY

Login before adding your answer.

Traffic: 1277 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6