Question

PCA plot interpretation from scRNA data set

0

Entering edit mode

5.7 years ago

cook.675 ▴ 250

Hi we are working on a scRNA data set that is a population of cells that has either been treated with control (VEH) or with treatment (IMQ) and I had some questions about the understanding of PCA;

We ran the PC analysis for the integrated dataset and plotted PC1 vs PC2, grouped by stimulation and we notice a homogenous overlap of control and treated cell. If I understand it correctly, this means that the most variable genes in the data set do not seperate out according to treatment, and perhaps instead are seperated out according to cell type (or another parameter?)

My second question is we are looking at cell cycle stages within this population as this is of biological interest to us. We did not regress out any of these genes as per the seurat vignette. But then afterward we were considering whether the cell cycle genes perhaps played an effect in our downstream clustering so we run the PC plot with the cell cycle genes as input and we get this PC plot. What I dont understand is what this means exactly. You can see that the PC1 vs PC2 plot is not homogenous for cell cycle, and that G1 cells seem to form their own section on this graph.

What does this mean exactly? Does this mean its nessecary to regress out the cell cycle genes before attempting to cluster? We would prefer the cells cluster on cell type rather than another parameter. Also if we regress out the cell cycle genes is it still possible to analyze the cell cycle stage once the clustering is done?

If you are inputting the cell cycle genes into the RunPCA function, wouldn't you expect it to sort out based on cell cycle?

PCA scRNA seq seurat • 2.5k views

ADD COMMENT • link 5.7 years ago by cook.675 ▴ 250

0

Entering edit mode

Please use the image button to embed your images. Paste the full link including the suffix (e.g. .png) into it.

See How to add images to a Biostars post enter image description here

ADD REPLY • link 5.7 years ago by ATpoint 88k

0

Entering edit mode

To me, it looks like 1) each of your samples has two different cell identities; 2) treatment does not affect much on the gene expressions.

ADD REPLY • link 5.7 years ago by shoujun.gu ▴ 350

0

Entering edit mode

With regard to point 2, this is not the case when we look at a list of DEG by cluster. We see a wide range of effect on genes within the clusters that we would expect biologically with treatment.

about point 1), the implication there is that this is because the cells form two groups on the PC plot? How would one investigate this further?

Could you comment on the cell cycle states?

ADD REPLY • link 5.7 years ago by cook.675 ▴ 250

0

Entering edit mode

How do you get the DEG list? If you compared by clusters (either cluster contains both ctrl and treated samples in your figures), you only get the DE genes between clusters, not between samples.
Based on your figure (https://imgur.com/a/0BQXR5P), your cells split into two groups (both VEH and IMQ). The downstream analysis is project dependent. But in general, you could 1) use marker genes to identify these groups (if you know some marker genes); 2) DE analysis between the groups (seems you've already did that) to find the DE genes.
As for cell cycle removal, instead of using cell cycle genes as input for PCA, you should regress out cell cycle effect and plot it again.
Also try to plot umap or tsne for better visualization.

ADD REPLY • link 5.7 years ago by shoujun.gu ▴ 350

0

Entering edit mode

Thanks so much I will try these things!

ADD REPLY • link 5.7 years ago by cook.675 ▴ 250

0

Entering edit mode

My question as someone who analysis scRNAseq data, is why are you even spending time on looking at the PCA plot? This will be very uninformative. You should only look at PCAs to see how many of them are needed to capture as much variability as you can in your datasets. Look at your Elbow plot (or jackstraw plot) and select which PCAs are appropriate for you to use for dimensionality reduction to make your tSNE and then your UMAP. That would be the most appropriate way of doing this in my opinion.

If you want to cover all your bases I would use a number of PCAs for dimensionality reduction and compare the outputs. E.g use 5, then 10, then 15, then 20 and see how your resulting UMAPs look.

ADD REPLY • link 5.7 years ago by V ▴ 420