I am aware about an approach that's called pseudobulking in single cell where bulk-like samples are generated from scRNAseq data (in absence of bulk data) to find which genes might be important at population level. But there is something my boss asked and I am not sure if that's a correct way to generate bulks.
I was asked to sample 60% of total reads from fastqs of 10X data (UMI data 3' chemistry) to generate three replicates per sample and then align them to plasmodium reference and use DESeq2 for DE analysis and check the overlap of DEG's with DEG's obtained from scrnaseq (all clusters combined). Now I did what was asked of me and I get the ideal biological replicates. But the dispersion estimate looks weird (I understand there will be no dispersion given that biological replicates are almost identical). I observe that nearly 66% of the genes detected are differentially expressed. Besides, out of total scrnaseq DEGs, 60% of them overlaps with these artificial bulk derived DEGs. So is this good.
I am confused if what I have been asked for is even legit or not?