Extract ENSEMBL IDs from processed Seurat object instead of gene symbols
1
1
Entering edit mode
2.4 years ago
bioneer ▴ 30

Hi Biostars,

I have some prefiltered scRNAseq count data and want to extract counts for cells belonging to a specific cluster.

The following code retrieves the counts with gene symbols by default:

cluster01 <- subset(data_integrated,idents = c("Cluster 1"))
cluster01_counts <- cluster01@assays$RNA@counts

The output looks like this (column names truncated):

X1     AAAC.. AAAC.. AAAC..
A1BG        0      0      3
A1CF        0      0      0

Column X1 contains gene symbols.

Is there a way to extract ENSEMBL IDs directly instead of gene symbols?

scRNAseq Seurat R • 5.6k views
ADD COMMENT
3
Entering edit mode
2.4 years ago
fracarb8 ★ 1.6k

This is what I would do in your situation.

# generate a table with ensemble and symbol for each gene in your data (good to keep it as a reference)
library(org.Hs.eg.db)
master_gene_table <- mapIds(org.Hs.eg.db, keys = cluster01_counts$X1, keytype = "SYMBOL", column="ENSEMBL")
master_gene_table <- as.data.frame(master_gene_table)

master_gene_table$ensembl <- cluster01_counts$ensembleID 

Note that you need to make sure to remove duplicated genes if you want to transfer the ensemble names to rownames(cluster01_counts)

ADD COMMENT
0
Entering edit mode

Thank you, fracarb8, your solution works very well! Thank you also for the hint to remove duplicated genes when converting to row names.

ADD REPLY

Login before adding your answer.

Traffic: 1440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6