What is the best practice for analysing single-cell data with low sequencing depth?
2
0
Entering edit mode
5 months ago
volkanergin ▴ 10

Hi everyone,

I have a quick question about my recent single-cell datasets that I hope you could help with. I am about to start analyzing two datasets (Control vs. Treat; libs were created by 10X 3' expression platform, and sequenced by HiSeqX-PE150). Based on the CellRanger metrics, in both samples, NGS detected an average of 20,000 cells, 6000 reads per cell, 600 genes per cell. In this case, what kind of pipeline or strategy would you recommend me to properly analyze the data? For different datasets with a higher number of reads and genes per cell, I used to follow a Seurat-based pipeline including merging datasets, integration, SCTransform, and finding DEGs etc. Do you think I should follow the same way to extract variable genes between the control and treat samples? Or, is it a good idea to rerun the samples in an NGS with a deeper sequencing capacity like NovaSeqS4-PE150?

Thank you Volkan

10X depth scRNA-seq Sequencing Seurat data • 730 views
2
Entering edit mode

Consider using alevin-fry for reasons that should be apparent after you read this : https://www.biorxiv.org/content/10.1101/2021.06.29.450377v1

2
Entering edit mode
5 months ago
GenoMax 110k

S4 flowcell for just 2 samples may be overkill in addition to being extremely expensive. A billion reads may be enough (how many total reads were there in the initial run) from SP or S1 flowcell. If your libraries were not that good to begin with then sequencing more is not likely to fix this fundamental problem.

Just to be sure do you have your numbers backwards by chance? i.e. it should be 6000 cells with 20000 reads? 10x recommends using maximum of 10000 cells.

0
Entering edit mode

Thank you guys for your recommendations. I'll look through the alevin-fry to see how I can benefit from that. Briefly, the libs were pretty good with no degradation or oligo artifacts based on Agilent's QC analysis. After the HiSeqX NGS run, I was able to get an average of 120M total reads per sample for a duplicate of Control and Treat samples (total of 4 samples in a single lane were run). I think you agree that I can proceed with a new run of the samples in S4 as long as the libs are intact and properly prepared. Btw, I was able to get around 20K cells from dissociated tissues, but I wasn't willing to count the number of cells before starting the 10X protocol because I didn't want to lose any cells since they were all the cells I had. My pretests for tissue dissociation usually yielded around 8-10K cells and I assumed that I had a similar amount of cells, but for the actual experiment, I processed freshly collected tissues that's why I think they yielded double. Anyway, thank you for your suggestions.

0
Entering edit mode

I generally fear exceeding recommendations from companies like 10x. You explained why you did it but I wonder if that had an effect on number of genes being detected (only 600 in your case). Sounds like you are willing to go forward with a S4 run so let us know what you find out.

0
Entering edit mode

Thank you Genomax for your reply, I'll let you know for sure once I decided how to go with that. I think running 2 samples per lane instead of 4 in HiSeqX would be helpful to get a broader view of DEGs.

2
Entering edit mode
5 months ago

6000 reads per cell is very low for 10X data. 10X recommends 20k, minimum, and more typically people will shoot for 50k for decent results. I'd try to get more reads if you can (though alevin-fry is also an excellent quantification choice as well).

Traffic: 2148 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.