Question: PCA on selected genes.
gravatar for a.archana
19 months ago by
a.archana0 wrote:

I am analysing RNASeq data. I got 45 samples with nine different treatment, each five replicate. 9*5 = 45 (experimental design).

I did PCA and they do not seem to separate quite well based on treatment. pc1 = 12 % pc2 = 10 % pc3 = 7 %

I did pairwise differential expression among the samples. And I am planning to do PCA on DE genes only. I have normalised matrix of all the genes. My question is:

To do PCA on selected gene should I take input of all genes normalised -> correlation matrix -> retrieve DE genes only -> prcomp

OR from normalised gene matrix -> retrieve DE genes -> correlation matrix -> prcomp

Any help would be much appreciated. Thanks

rna-seq gene • 936 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by a.archana0

Thanks everyone. Thanks for your input.

ADD REPLYlink written 19 months ago by a.archana0

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLYlink modified 19 months ago • written 19 months ago by genomax73k
gravatar for Friederike
19 months ago by
United States
Friederike5.2k wrote:

why would you want to do PCA on DE genes? you already know that these genes are going to separate the treatments since this is, presumably, the comparison you did to get those DE genes in the first place.

if you see that PCA on normalized (!) values does not yield the expected pattern, then this is an indication that you might have batch effects that explain more variability than your conditions of interest. that is valuable information as you might be able to identify the factor that explains the batch effect (typical example would be the type of sample instead of the type of treatment).

ADD COMMENTlink written 19 months ago by Friederike5.2k

Thanks Friederike.

I should have been more clear in my question.

We are not expecting samples to separate completely on treatment. Some of the treatments have more effect than the other (this is also an observation). Now based on the known study on similar species, we know there are some genes which are known to have more effect on treatment. We want to see if these known genes have similar effect on our study. Which is why I want to do PCA on selected genes.

I hope it make sense. Thanks

ADD REPLYlink written 19 months ago by a.archana0

So select the genes of interest and do a heatmap with them. You can also do a Venn diagram to see how much of the known genes are found by your study as well.

ADD REPLYlink written 19 months ago by h.mon27k

I agree with both Friederike and h.mon here. Following from h.mon's point, just do the clustering with heatmap and think about further refining your DEGs via regression modelling and 'gene signature' creation.

The only realm where I have seen PCA used on DEGs is when one would want to develop a new scoring system based on eigenvalues, as is performed in WGCNA, i.e., network analysis.

ADD REPLYlink written 19 months ago by Kevin Blighe49k

Thanks h.mon.

I am planning to do this. I have a similar question for this as well. If I do heatmap. To calculate z score (1)Should I calculate z score on entire gene(normalised reads) and get my gene of interest to do heatmap (2) OR Take my gene of interest(normalised reads) and then calculate z score. Does this change the result?


ADD REPLYlink written 19 months ago by a.archana0

I don't understand the distinction you're making, so I would just say:

  1. make a matrix of normalized expression values for all your genes of interest (e.g., your DEG) and all your samples
  2. use, for example, pheatmap, which will allow you to see the effects of row- and column-based z-scores as well as the actual values without z-value transformation (you can set this via the parameter scale = c("none","row","column") )
ADD REPLYlink modified 19 months ago • written 19 months ago by Friederike5.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 977 users visited in the last hour