Question

Remove ensembl identifiers from Seurat

1

Entering edit mode

3.8 years ago

V ▴ 380

Hello,

I took a 10x matrix from a collaborator and created a Seurat object. The issue I am having which I only realised when attempting to visualise my favourite genes, is that the original matrix has gene names in the format of "gene name - ensembl"

example :

Dcn-ENSMUSG00000019929.16, Inmt-ENSMUSG00000003477.5, Mfap4-ENSMUSG00000042436.12

So I can not search for my genes of interest unless I know what the corresponding ENSMUS following it is. Is there I way I can remove at this stage everything after (and including) the "-" so that I am left only with the gene name?

So convert "Dcn-ENSMUSG00000019929.16" to simply "Dcn".

Thank you :)

seurat rstudio RNA-Seq • 7.7k views

ADD COMMENT • link updated 3 months ago by TigerSheng • 0 • written 3.8 years ago by V ▴ 380

0

Entering edit mode

Simply grep it while ensuring not to get false partial matches:

grep(paste0("^", YourGeneName, "-"), rownames(SeuratObject))

ADD REPLY • link 3.8 years ago by ATpoint 81k

0

Entering edit mode

I've tried this and it only returns the row number of where my gene is :(

ADD REPLY • link 3.8 years ago by V ▴ 380

0

Entering edit mode

Hi, I am running into a similar problem in Seurat: My genes have the following format ENSEMBL-gene-biotype (e.g. ENSG00000000003_TSPAN6_ProteinCoding) so I run into the same problem whenever I need a reference (cell cycle scoring, label transfer,...). I would like to remove everything before and after the gene name (including the two "_"). And ideally I would like to store this original format so that I can later change it again (to know the biotype).

Thanks so much for your help!

Best Julia

ADD REPLY • link 19 months ago by Julia • 0

0

Entering edit mode

3.8 years ago

jomo018 ▴ 720

Suppose geneNames is your vector of names, genes will be the required vector:

library(purrr)
genes=unlist(map(strsplit(geneNames,"-ENSMUS",fixed=T),1))

You are splitting each name into two parts and purrr::map "grabs" the first member.

ADD COMMENT • link 3.8 years ago by jomo018 ▴ 720

0

Entering edit mode

Thanks for this, but could you help on how this would be performed on the Seurat object?

ADD REPLY • link 3.8 years ago by V ▴ 380

0

Entering edit mode

I am afraid not. See: https://github.com/satijalab/seurat/issues/2617

ADD REPLY • link 3.8 years ago by jomo018 ▴ 720

0

Entering edit mode

Should just be: rownames(SeuratObject)=unlist(map(strsplit(rownames(SeuratObject),"-ENSMUS",fixed=T),1))

ADD REPLY • link 3.8 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

I think this does not work with complex objects like a seurat object.

ADD REPLY • link 3.8 years ago by jomo018 ▴ 720

0

Entering edit mode

3.8 years ago

swbarnes2 14k

Ensembl IDs are always unique, gene names are not, so making the row names not unique could cause problems. 10x usually makes a matrix such that row names are Enesmbl IDs, but software like Loupe and Seurat can work in gene names with that data. Your matrix really isn't made like that?

ADD COMMENT • link 3.8 years ago by swbarnes2 14k

0

Entering edit mode

3 months ago

TigerSheng • 0

This should work.

RNA <- obj@assays$RNA
RNA@counts@Dimnames[[1]] = NEW_NAMES
RNA@data@Dimnames[[1]] = NEW_NAMES
rownames(RNA@meta.features) = NEW_NAMES
obj@assays$RNA <- RNA

ADD COMMENT • link 3 months ago by TigerSheng • 0

score 2 · Accepted Answer · 2020-06-23

Huh, you are quite right that my comment on jomo's answer doesn't work. Too used to SingleCellExperiment objects, where that does work. Anyway, there are two workarounds. The annoying one is to extract the counts from assay data and remake the Seurat object:

# Get counts
count.data <- GetAssayData(object = pbmc_small[["RNA"]], slot = "counts")
count.data <- as.matrix(count.data)

# Rename rows
rownames(count.data) <- unlist(map(strsplit(rownames(count.data),"-ENSMUS",fixed=T),1))

# Generate new Seurat object.
new.obj <- CreateSeuratObject(
     count.data,
     project = "SeuratProject",
     assay = "RNA",
     min.cells = 0,
     min.features = 0,
     names.field = 1,
     names.delim = "_",
     meta.data = NULL
)

Obviously not ideal. You'll have to redo the normalization. Other option is to convert to a SingleCellExperiment object, rename the rownames as above, then convert back to a Seurat object. This at least retains the normalization and any dimensionality reductions, metadata, clustering, etc.