Hi we are working on a scRNA data set that is a population of cells that has either been treated with control (VEH) or with treatment (IMQ) and I had some questions about the understanding of PCA;
We ran the PC analysis for the integrated dataset and plotted PC1 vs PC2, grouped by stimulation and we notice a homogenous overlap of control and treated cell. If I understand it correctly, this means that the most variable genes in the data set do not seperate out according to treatment, and perhaps instead are seperated out according to cell type (or another parameter?)
My second question is we are looking at cell cycle stages within this population as this is of biological interest to us. We did not regress out any of these genes as per the seurat vignette. But then afterward we were considering whether the cell cycle genes perhaps played an effect in our downstream clustering so we run the PC plot with the cell cycle genes as input and we get this PC plot. What I dont understand is what this means exactly. You can see that the PC1 vs PC2 plot is not homogenous for cell cycle, and that G1 cells seem to form their own section on this graph.
What does this mean exactly? Does this mean its nessecary to regress out the cell cycle genes before attempting to cluster? We would prefer the cells cluster on cell type rather than another parameter. Also if we regress out the cell cycle genes is it still possible to analyze the cell cycle stage once the clustering is done?
If you are inputting the cell cycle genes into the RunPCA function, wouldn't you expect it to sort out based on cell cycle?