Question: Converting scRNA-Seq to Bulk-RNASeq
gravatar for Assa Yeroslaviz
4 months ago by
Assa Yeroslaviz1.4k
Assa Yeroslaviz1.4k wrote:

for the initial analysis of our data set we would like to convert our single-cell RNA-Seq into Bulk RNA-Seq by summarizing the #reads per gene per sample.

I was wondering if anyone has already some experience with this kind of analysis.

Would it make sense to calculate the average expression for each gene in each sample (by dividing it with the number of cells in the sample) or just taking the sum() of all the cells in each sample

With the count matrix created with this methodology we would like to apply standard RNA-Seq analyses such as DESeq2 (for differential expression) or Mfuzz (for time-series analysis).

ADD COMMENTlink modified 4 months ago • written 4 months ago by Assa Yeroslaviz1.4k

The term you're looking for is "pseudo bulk" and you'll want to sum values across cells.

ADD REPLYlink written 4 months ago by Devon Ryan97k


Regarding your question what I did in the past was trying to compare different clusters of scRNA-seq versus bulk RNA-seq using correlation indexes (it did not worked as expected!). For that purpose, what we did was to average the read counts per cluster per condition/sample using Seurat R package functions:

sobjList <- SplitObject(data, = "stim") # 'stim': fct var with 'sample_1' or 'sample_2'

## Average gene expression values per cluster and sample_1 or sample_2
samp_1_averClt <- AverageExpression(object = sobjList$sample_1, 
                                  assays = "integrated", 
                                  slot = "data", 
                                  return.seurat = TRUE)

samp_2_averClt <- AverageExpression(object = sobjList$sample_2, 
                                  assays = "integrated", 
                                  slot = "data", 
                                 return.seurat = TRUE)

Of course this will give you average read counts per cluster per stim variable condition. This is not exactly what you want, but if you have scRNA-seq data I would do diferentially gene expression analysis between different cell populations/clusters rather than the whole thing.


ADD REPLYlink modified 4 months ago • written 4 months ago by antonioggsousa1.5k

thanks. After searching for the term "Pseudo bulk" I found more information. But it seems to me that as António mentioned above it all relates to calculating DE between clusters.

What we would like to do though is a differential expression analysis on the complete data set. We are encountering the problem that we are not yet sure about the correctness of the clustering results. For that reason we would like to first do a "standard" pseudo bulk RNA-Seq analysis on the complete data set by converting each sample (of course with differing number of cells) in to a single column in the new count matrix. We have partially a huge difference in the total number of cells (even up to 10fold, 9K vs. 90K). So I'm not sure, just calculating the sum of all cell won't create a too big of a difference between the samples.

This is why I was hoping, taking the average of all cell will give a better value for each gene across all samples.

Does it make sense? Or do you still think I should take the sum across all samples?

ADD REPLYlink modified 4 months ago • written 4 months ago by Assa Yeroslaviz1.4k

Although a 10-fold is a quite big difference, the normalization procedure of DESeq2 should mitigate the different read depth and, therefore, this difference. I believe that a PCA or sample-to-sample heatmap should highlight if this approach suppressed any potential bias caused by distinct sample read depth/coverage.

Do you know why you have such a great difference? I guess is related with the number of cells in one sample versus another, but still is one order higher.


ADD REPLYlink modified 4 months ago • written 4 months ago by antonioggsousa1.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1998 users visited in the last hour