Question

Extract ENSEMBL IDs from processed Seurat object instead of gene symbols

2

Entering edit mode

3.6 years ago

bioneer ▴ 40

Hi Biostars,

I have some prefiltered scRNAseq count data and want to extract counts for cells belonging to a specific cluster.

The following code retrieves the counts with gene symbols by default:

cluster01 <- subset(data_integrated,idents = c("Cluster 1"))
cluster01_counts <- cluster01@assays$RNA@counts

The output looks like this (column names truncated):

X1     AAAC.. AAAC.. AAAC..
A1BG        0      0      3
A1CF        0      0      0

Column X1 contains gene symbols.

Is there a way to extract ENSEMBL IDs directly instead of gene symbols?

scRNAseq Seurat R • 6.9k views

ADD COMMENT • link 3.6 years ago by bioneer ▴ 40

score 4 · Accepted Answer · 2021-11-24

4

Entering edit mode

3.6 years ago

fracarb8 ★ 1.7k

This is what I would do in your situation.

# generate a table with ensemble and symbol for each gene in your data (good to keep it as a reference)
library(org.Hs.eg.db)
master_gene_table <- mapIds(org.Hs.eg.db, keys = cluster01_counts$X1, keytype = "SYMBOL", column="ENSEMBL")
master_gene_table <- as.data.frame(master_gene_table)

master_gene_table$ensembl <- cluster01_counts$ensembleID

Note that you need to make sure to remove duplicated genes if you want to transfer the ensemble names to rownames(cluster01_counts)

ADD COMMENT • link 3.6 years ago by fracarb8 ★ 1.7k

0

Entering edit mode

Thank you, fracarb8, your solution works very well! Thank you also for the hint to remove duplicated genes when converting to row names.

ADD REPLY • link 3.6 years ago by bioneer ▴ 40