How to add Ensembl ids after Pseudobulk analysis by DESeq2
18 days ago
Sara ▴ 30

Hi all,

I used DESeq2 to do Pseudobulk analysis on my Seurat object. I have a problem converting gene names to Ensembl IDs. My row names are, some with ENSG, some with gene names. I want to have Ensembl IDs and chromosome names as well. Here is the part of my DESeq code for Pseudobulk analysis:

dds <- DESeqDataSetFromMatrix(countData = counts_bcell,
                              colData = colData,
                              design = ~Age+Sex+condition)

keep <- rowSums(counts(dds)) >=10
dds <- dds[keep,]

colData(dds)$condition <- relevel(colData(dds)$condition, ref = "Control")

#run DESeq2
dds <- DESeq(dds, test = "LRT", reduced = ~Age+Sex)

#check the coefficients for the comparison

#Generate result object
res <- results(dds, name = "condition_Patient_vs_Control")
mapped <- data.frame(GeneName = rownames(res),
                     ensemblID = mapIds(, keys =rownames(res), keytype = "SYMBOL", column="ENSEMBL"))

res$ensembl_gene_id <- mapped$ensemblID

If we look at mapped it looks like as below for the gene names with ENSG I don't get any ensemblID.

> mapped
                       GeneName       ensemblID
ENSG00000238009 ENSG00000238009            <NA>
ENSG00000241860 ENSG00000241860            <NA>
ENSG00000290385 ENSG00000290385            <NA>
ENSG00000291215 ENSG00000291215            <NA>
ENSG00000229905 ENSG00000229905            <NA>
LINC01409             LINC01409            <NA>
ENSG00000290784 ENSG00000290784            <NA>
FAM87B                   FAM87B ENSG00000177757
LINC00115             LINC00115            <NA>

Any suggestions, please, or a better way to add ensemblID and chromosome name and biotype?

I appreciate your help. Many thanks!

Seurat Pseudobulk single-cell DESeq2 scRNA • 430 views
17 days ago

Go back to your original counts matrix or input data and assign consistent IDs during its generation.

I used Seurat, and in Seurat, I have gene names (which some are with gene-symbols and some with ENSG ids). Then I did Pseudobulk. how can I convert them or add ENS IDs as alternatives in another column in Seurat?

Why is your GeneName column in mapped a mix of Ensembl IDs and gene names? What Jared wants to say is that during the preprocessing you should already have made sure that you only have a constant identifier (Ensembl IDs) present, and not this wild mix. From a constant identifier it is easy to convert, e.g. by loading a GTF file that contains both ID and name, and then just do a left join with that.

This is exactly my point. You should not have gotten data to this state unintentionally, so you need to double-check what was done upstream to see where the swaps occurred and rectify it at that point.

It was 10X data, and I processed it using Seurat. Then, I came to the point of Pseudobulk using DESeq2. Does this mean I have to check which parameters they used in Cellranger, or do I have to check/change something in my Seurat analysis?


