Question

How can I identify cells from Single cell experiment (SCE) using bulk RNA experiment (RNAseq) genes signatures?

0

Entering edit mode

4.7 years ago

sofiagreen72211 ▴ 30

Hello Everyone,

I have two experiments: single cell (SCE) and RNAseq (RNA). Each of these experiments have two groups: control (ctr) and treated (trt).

Experiment 1: I've used Seurat pipeline to perform clustering and cell types identification of SCE.

Experiment 2: I have used DESeq2 to identify significant genes (up/down, say n=50) between trt vs ctr from RNAseq.

Now I want to use these 50 significant genes (RNAseq, bulk RNA) to identify which cells from SCE follow a similar expression signature. If these 50 genes would be all up or down, then I could take the average through which I can compare the average of each cell from SCE. However, I herein can't take the mean as some of the genes are down-regulated, considering mean of positive and negative values (fold-changes of significant genes) is not a good parameter to define the overall expression of RNAseq experiment as negative and positive values cancel each other.

What would be the best parameter to represent (positive and negative values)?

I would greatly appreciate any feedback, hint or suggestion.

Thank you,
Sofia

RNA-Seq next-gen • 2.0k views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 4.7 years ago by sofiagreen72211 ▴ 30

score 1 · Answer 1 · 2019-08-05

1

Entering edit mode

4.7 years ago

Kristoffer Vitting-Seerup ★ 4.0k

Ahh sorry I misunderstood your original question - I'll add a new answer. A common way to solve that problem is to generate a signature from each of your bulk conditions (say to 50-100 most significantly up- an down-regulated genes (which each would be a signature of the cell they are higher expressed in)) and then use a single sample scoring tool such as ssgsea from the gsva R package. Using that you will get a score for each of the two signatures in each of the single cell and you simply classify each cell as the type with the highest score.

ADD COMMENT • link 4.7 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

Thank you for your advise and link. Greatly appreciated!

ADD REPLY • link 4.7 years ago by sofiagreen72211 ▴ 30

score 1 · Answer 2 · 2019-08-05

1

Entering edit mode

4.7 years ago

jared.andrews07 ★ 16k

There is a new version of SingleR(see vignette for examples) under development that could handle this quite easily. All it needs is your DESeq2 count matrix, SCE count matrix, and your marker gene list. It will assign identities to each single cell based on the expression profiles in your training set (your DESeq2 counts).

ADD COMMENT • link 4.7 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

Thank you for your advise and link. Greatly appreciated!

ADD REPLY • link 4.7 years ago by sofiagreen72211 ▴ 30

score 0 · Answer 3 · 2019-08-02

0

Entering edit mode

4.7 years ago

Kristoffer Vitting-Seerup ★ 4.0k

A solution could be to correlate the log2FC in the DE bulk genes to the corresponding log2FC in the single cell data. That should take into account both the size and the sign of the values.

ADD COMMENT • link 4.7 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

Thank you for your suggestion. However, I don't want to do a overall comparison between SCE (trt/ctr; log2FC) vs log2FC (RNAseq).

The main goal herein is to classify each and every cell from the SCE using the RNA-seq (significant genes signature). Therefore, either a cell will follow the signature (say score = 1) or not follow the RNA-seq signature (score = -1).

I can estimate the log2FC from RNA-seq (trt/ctr). On the other hand, I've raw, normalized and scaled data sets from the SCE. How can I estimate the log2FC for each cell, considering cells labels are unknown?

Again, thank you for your time and suggestion. Any further suggestions will be greatly appreciated.

Thanks, Sofia

ADD REPLY • link 4.7 years ago by sofiagreen72211 ▴ 30