Hello! I'm running some samples for a collaborator. They sequenced small intestine cells post-infection, and even prior to sequencing the cells were quite apoptotic. 10k cells were loaded per sample, and per sample only 3k, 4k, and 5k cells were recovered.
I'm trying to figure out the QC cutoffs for the samples now, as the collaborator would like to move forward with the given samples. I see a lot cells with high MT content, but "okay" feature and gene expression.
Any suggestions on how stringent I should be with the filtering? Does it also make more sense to do it on a sample by sample basis? Or is this data unusable?
Thanks!
Hard to say with just these plots. What is notable is that most cells express very few genes, like below 1000, that is concerning I would say. Still, cannot tell without the data at hand.
Yes, I always iterate by sample, and if I have annotations then even by broad celltypes per sample. For example, structural cells often express like 5000 or more genes. In neutrophils we say maybe 1000. Mixing them in a single violon plot might inflate the margins so much that cutoffs are not optimal for either celltype. For %mt something like 10% usually works. Just see downstream if data make sense. Really hard to say here.
Thanks for the reply! What other plots would you recommend making to evaluate the QC?
One of the reasons for the low genes is likely the low sequencing depth as the collaborating group wanted to see if the samples were okay post-sequencing before investing more in deeper sequencing.
I will for sure take the approach you suggested with the cell type per sample filtering, but is there a good way to answer the question: "should we sequence this to further depth or accept the failure of this experiment?"
Thanks again!
I would check whether you see expected celltypes with expected markers, and whether you see any evidence for interesting differences between the biological groups you have in there.
The MT count for D8 is high, but it could be biologically relevant.