Seurat scRNA convert Ensembl ID to gene symbol
1
0
Entering edit mode
11 months ago
clizama • 0

Hi,

I'm download some datasets from Geo Database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE155960)

I found the names are in ENSEMBL nomenclature and I need to convert into Gene symbol in order to do the QC metrics in the Seurat pipeline.

I'm using this code to convert the ENSEMBL to gene symbol:

library(Seurat)
library(patchwork)
library (dplyr)
library(Biomart)
library(org.Hs.eg.db)
library(ggplot2)
library(Matrix


    countsData<- read.csv(file = "~/GSE155960_RAW/CD45N-L1.csv", header = TRUE, row.names = 1)
    ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
    bm <- getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), values=rownames(countsData), mart=ensembl)

    hgnc.symbols <- bm$hgnc_symbol[match(rownames(countsData), bm$ensembl_gene_id)]
    countsData <- as.matrix(countsData)
    rownames(countsData) <- hgnc.symbols

    CD45N_L1 <- CreateSeuratObject(counts = t(countsData), project = "H_Fat", min.cells = 3, min.features = 200

    CD45N_L1[["percent.mt"]] <- PercentageFeatureSet(CD45N_L1, pattern = "^MT-")

However when I ran the vlnplot to see the percent of mt, I saw all in 0. The samples are human.

When I generated the bm file, I can see some MT-genes matching with the ENSEMBL names, however after I generated the seurat object I dont see any MT-genes. Not sure why Im loosing the MT-genes during the conversion. I hope somebody have a solution for that.

Thanks.

ensembl genesymbol seurat • 2.0k views
ADD COMMENT
0
Entering edit mode
11 months ago
Sasha ▴ 840

It seems like the issue might be related to the renaming of row names in the countsData matrix. I would recommend checking if the mitochondrial genes are present in the countsData matrix after renaming the row names with HGNC symbols. You can do this by checking the presence of mitochondrial genes in the row names before and after the conversion. Here's a modified version of your code with additional checks for mitochondrial genes:

library(Seurat)
library(patchwork)
library(dplyr)
library(biomaRt)
library(org.Hs.eg.db)
library(ggplot2)
library(Matrix)

countsData <- read.csv(file = "~/GSE155960_RAW/CD45N-L1.csv", header = TRUE, row.names = 1)
ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
bm <- getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), values=rownames(countsData), mart=ensembl)

# Check mitochondrial genes before conversion
print("Mitochondrial genes before conversion:")
print(grep("^MT-", rownames(countsData), value = TRUE))

hgnc.symbols <- bm$hgnc_symbol[match(rownames(countsData), bm$ensembl_gene_id)]
countsData <- as.matrix(countsData)
rownames(countsData) <- hgnc.symbols

# Check mitochondrial genes after conversion
print("Mitochondrial genes after conversion:")
print(grep("^MT-", rownames(countsData), value = TRUE))

CD45N_L1 <- CreateSeuratObject(counts = t(countsData), project = "H_Fat", min.cells = 3, min.features = 200)

CD45N_L1[["percent.mt"]] <- PercentageFeatureSet(CD45N_L1, pattern = "^MT-")

This will help you identify if the mitochondrial genes are being lost during the conversion process. If the mitochondrial genes are still present after the conversion, you might want to check the Seurat object creation step and the PercentageFeatureSet function for any issues.

I'm using my chatbot here (https://tinybio.cloud) to help generate this answer. You can download it from the website.

Good luck with your research!

ADD COMMENT

Login before adding your answer.

Traffic: 2612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6