Question: Can this PCA be a highly good result?
0
gravatar for fernardo
8 months ago by
fernardo 130
Italy
fernardo 130 wrote:

Hello All,

Can somebody please tell me if this PCA result but a good result and which way recommended best to validate that?

Note: the PCA is based on around 20 features and the samples are around 100.

enter image description here

Thanks a lot

ADD COMMENTlink modified 8 months ago • written 8 months ago by fernardo 130
1

What question do you want to answer?

How to add images to a Biostars post

ADD REPLYlink modified 8 months ago • written 8 months ago by ATpoint26k

Actually I asked a question not trying to answer one :) thanks for the link too.

ADD REPLYlink written 8 months ago by fernardo 130
1
gravatar for genomax
8 months ago by
genomax74k
United States
genomax74k wrote:

We can see a clear separation with respect to the two components you are plotting but beyond that there is no information to provide any judgement. You need to provide additional information about what experiment you are working on and are these components representing the main effect you are trying to study.

ADD COMMENTlink written 8 months ago by genomax74k

Thanks. The study is from two conditions (disease vs normal).

ADD REPLYlink written 8 months ago by fernardo 130
1

Then it looks like you have a clear difference between them.

ADD REPLYlink written 8 months ago by Devon Ryan92k

You are just doing PCA using the differentially expressed genes, right? - 20 genes? You may also want to show the separation in a cluster dendrogram and heatmap.

ADD REPLYlink written 8 months ago by Kevin Blighe51k

@Devon and @Kevin, thanks for both. I am picking up genes randomly and most of them are not differentially expressed or at least not statistically significant in that term. So my point is that, perhaps among those 20 genes only 3 of them differentially expressed and make such out. Can this be significant? Plus, heatmap and clustering would be enough to prove this separation? and also how about if I involve a classification method such as SVM? even I already applied and accuracy and Kappa value is too high.

ADD REPLYlink written 8 months ago by fernardo 130

Picking up genes randomly does not sound scientific in this situation - why would you do that? Why not do PCA on the entire dataset?

Usually, people perform a differential expression analysis and then subset their original data matrix with the statistically significant genes. Clustering with heatmap generation may then be performed on the subset data matrix.

ADD REPLYlink written 8 months ago by Kevin Blighe51k

Two answers are here.

First, if a subset of gene gives me the same output as the entire dataset, why is it not useful and scientific with less effort and information, gives good and same result? what do you think?

Second, following what others generally do like DE analysis and heatmap is not mandatory and it prevents making new approaches, at least I believe.

ADD REPLYlink modified 8 months ago • written 8 months ago by fernardo 130

Hey, well, in that case, you should be performing the random samplng many times, and then checking the reproducibility of the results. Another name for this is bootstrapping.

I do not 100% understand your second point. Clustering / heatmap can show to what degree a panel of genes can segregate, for example, cases and controls.

ADD REPLYlink written 8 months ago by Kevin Blighe51k

Yes, exactly, I do random sampling / bootstrapping.

ADD REPLYlink written 8 months ago by fernardo 130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1762 users visited in the last hour