Question

Producing Bulk samples from 10X data

0

Entering edit mode

7 months ago

rohitsatyam102 ▴ 850

Hi Everyone

I am aware about an approach that's called pseudobulking in single cell where bulk-like samples are generated from scRNAseq data (in absence of bulk data) to find which genes might be important at population level. But there is something my boss asked and I am not sure if that's a correct way to generate bulks.

I was asked to sample 60% of total reads from fastqs of 10X data (UMI data 3' chemistry) to generate three replicates per sample and then align them to plasmodium reference and use DESeq2 for DE analysis and check the overlap of DEG's with DEG's obtained from scrnaseq (all clusters combined). Now I did what was asked of me and I get the ideal biological replicates. But the dispersion estimate looks weird (I understand there will be no dispersion given that biological replicates are almost identical). I observe that nearly 66% of the genes detected are differentially expressed. Besides, out of total scrnaseq DEGs, 60% of them overlaps with these artificial bulk derived DEGs. So is this good.

I am confused if what I have been asked for is even legit or not?

enter image description here

rnaseq scrnaseq deseq2 bulk seurat • 449 views

ADD COMMENT • link updated 7 months ago by ATpoint 82k • written 7 months ago by rohitsatyam102 ▴ 850

score 2 · Answer 1 · 2023-09-10

The dispersion plot, as you say, is expected as you are creating pseudoreplication. The way paeudobulks are typically created is based on the count matrix. You sum raw counts per cluster, celltype, group, whatever makes sense. This pseudoreplication you create makes no sense to me. If you don't have replication you cannot make it up.