Remove ensembl identifiers from Seurat
4
1
Entering edit mode
3.8 years ago
V ▴ 380

Hello,

I took a 10x matrix from a collaborator and created a Seurat object. The issue I am having which I only realised when attempting to visualise my favourite genes, is that the original matrix has gene names in the format of "gene name - ensembl"

example :

Dcn-ENSMUSG00000019929.16, Inmt-ENSMUSG00000003477.5, Mfap4-ENSMUSG00000042436.12

So I can not search for my genes of interest unless I know what the corresponding ENSMUS following it is. Is there I way I can remove at this stage everything after (and including) the "-" so that I am left only with the gene name?

So convert "Dcn-ENSMUSG00000019929.16" to simply "Dcn".

Thank you :)

seurat rstudio RNA-Seq • 7.7k views
ADD COMMENT
0
Entering edit mode

Simply grep it while ensuring not to get false partial matches:

grep(paste0("^", YourGeneName, "-"), rownames(SeuratObject))
ADD REPLY
0
Entering edit mode

I've tried this and it only returns the row number of where my gene is :(

ADD REPLY
0
Entering edit mode

Hi, I am running into a similar problem in Seurat: My genes have the following format ENSEMBL-gene-biotype (e.g. ENSG00000000003_TSPAN6_ProteinCoding) so I run into the same problem whenever I need a reference (cell cycle scoring, label transfer,...). I would like to remove everything before and after the gene name (including the two "_"). And ideally I would like to store this original format so that I can later change it again (to know the biotype).

Thanks so much for your help!

Best Julia

ADD REPLY
2
Entering edit mode
3.8 years ago

Huh, you are quite right that my comment on jomo's answer doesn't work. Too used to SingleCellExperiment objects, where that does work. Anyway, there are two workarounds. The annoying one is to extract the counts from assay data and remake the Seurat object:

# Get counts
count.data <- GetAssayData(object = pbmc_small[["RNA"]], slot = "counts")
count.data <- as.matrix(count.data)

# Rename rows
rownames(count.data) <- unlist(map(strsplit(rownames(count.data),"-ENSMUS",fixed=T),1))

# Generate new Seurat object.
new.obj <- CreateSeuratObject(
     count.data,
     project = "SeuratProject",
     assay = "RNA",
     min.cells = 0,
     min.features = 0,
     names.field = 1,
     names.delim = "_",
     meta.data = NULL
)

Obviously not ideal. You'll have to redo the normalization. Other option is to convert to a SingleCellExperiment object, rename the rownames as above, then convert back to a Seurat object. This at least retains the normalization and any dimensionality reductions, metadata, clustering, etc.

ADD COMMENT
0
Entering edit mode

Worked like a charm. Thank you :)

ADD REPLY
0
Entering edit mode
3.8 years ago
jomo018 ▴ 720

Suppose geneNames is your vector of names, genes will be the required vector:

library(purrr)
genes=unlist(map(strsplit(geneNames,"-ENSMUS",fixed=T),1))

You are splitting each name into two parts and purrr::map "grabs" the first member.

ADD COMMENT
0
Entering edit mode

Thanks for this, but could you help on how this would be performed on the Seurat object?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Should just be: rownames(SeuratObject)=unlist(map(strsplit(rownames(SeuratObject),"-ENSMUS",fixed=T),1))

ADD REPLY
0
Entering edit mode

I think this does not work with complex objects like a seurat object.

ADD REPLY
0
Entering edit mode
3.8 years ago

Ensembl IDs are always unique, gene names are not, so making the row names not unique could cause problems. 10x usually makes a matrix such that row names are Enesmbl IDs, but software like Loupe and Seurat can work in gene names with that data. Your matrix really isn't made like that?

ADD COMMENT
0
Entering edit mode
3 months ago
TigerSheng • 0

This should work.

RNA <- obj@assays$RNA
RNA@counts@Dimnames[[1]] = NEW_NAMES
RNA@data@Dimnames[[1]] = NEW_NAMES
rownames(RNA@meta.features) = NEW_NAMES
obj@assays$RNA <- RNA
ADD COMMENT

Login before adding your answer.

Traffic: 1446 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6