Converting scRNA-Seq to Bulk-RNASeq
0
1
Entering edit mode
10 months ago
Assa Yeroslaviz ★ 1.6k

for the initial analysis of our data set we would like to convert our single-cell RNA-Seq into Bulk RNA-Seq by summarizing the #reads per gene per sample.

I was wondering if anyone has already some experience with this kind of analysis.

Would it make sense to calculate the average expression for each gene in each sample (by dividing it with the number of cells in the sample) or just taking the sum() of all the cells in each sample as.is.

With the count matrix created with this methodology we would like to apply standard RNA-Seq analyses such as DESeq2 (for differential expression) or Mfuzz (for time-series analysis).

scRNASeq bulk-RNASeq RNA-Seq single-cell • 1.6k views
2
Entering edit mode

The term you're looking for is "pseudo bulk" and you'll want to sum values across cells.

1
Entering edit mode

Hi,

Regarding your question what I did in the past was trying to compare different clusters of scRNA-seq versus bulk RNA-seq using correlation indexes (it did not worked as expected!). For that purpose, what we did was to average the read counts per cluster per condition/sample using Seurat R package functions:

sobjList <- SplitObject(data, split.by = "stim") # 'stim': fct var with 'sample_1' or 'sample_2'

## Average gene expression values per cluster and sample_1 or sample_2
#
samp_1_averClt <- AverageExpression(object = sobjList$sample_1, assays = "integrated", slot = "data", return.seurat = TRUE) samp_2_averClt <- AverageExpression(object = sobjList$sample_2,
assays = "integrated",
slot = "data",
return.seurat = TRUE)


Of course this will give you average read counts per cluster per stim variable condition. This is not exactly what you want, but if you have scRNA-seq data I would do diferentially gene expression analysis between different cell populations/clusters rather than the whole thing.

António

0
Entering edit mode

thanks. After searching for the term "Pseudo bulk" I found more information. But it seems to me that as António mentioned above it all relates to calculating DE between clusters.

What we would like to do though is a differential expression analysis on the complete data set. We are encountering the problem that we are not yet sure about the correctness of the clustering results. For that reason we would like to first do a "standard" pseudo bulk RNA-Seq analysis on the complete data set by converting each sample (of course with differing number of cells) in to a single column in the new count matrix. We have partially a huge difference in the total number of cells (even up to 10fold, 9K vs. 90K). So I'm not sure, just calculating the sum of all cell won't create a too big of a difference between the samples.

This is why I was hoping, taking the average of all cell will give a better value for each gene across all samples.

Does it make sense? Or do you still think I should take the sum across all samples?

0
Entering edit mode

Although a 10-fold is a quite big difference, the normalization procedure of DESeq2 should mitigate the different read depth and, therefore, this difference. I believe that a PCA or sample-to-sample heatmap should highlight if this approach suppressed any potential bias caused by distinct sample read depth/coverage.

Do you know why you have such a great difference? I guess is related with the number of cells in one sample versus another, but still is one order higher.

António