Question: Reporting p.value for my PCA results
1
gravatar for Shamim Sarhadi
3.1 years ago by
IRAN
Shamim Sarhadi220 wrote:

Hi I've done PCA for my gene expression data after DEG analysis, I can see that my case and control samples clustered distinctly but I want to report a p.value for this result, it worth to notice that I have the coordinate csv file that represents coordinates of samples from different phenotypes (case vs control).

Thanks in advance

pca gene expression • 5.4k views
ADD COMMENTlink modified 3.1 years ago by Kevin Blighe67k • written 3.1 years ago by Shamim Sarhadi220
4

PCA is an exploratory data analysis method. It does not test a null hypothesis and generate a p-value.

ADD REPLYlink written 3.1 years ago by Nicolas Rosewick9.2k

Perhaps move this to an answer?

ADD REPLYlink written 3.1 years ago by WouterDeCoster44k
1

ok. I'll try to explain a little bit more then

ADD REPLYlink written 3.1 years ago by Nicolas Rosewick9.2k
8
gravatar for Nicolas Rosewick
3.1 years ago by
Belgium, Brussels
Nicolas Rosewick9.2k wrote:

PCA is an exploratory data analysis method. It does not test a null hypothesis and generate a p-value.

If you want to compute a p-value maybe you should try pvclust package in R. It didn't use PCA but a hierarchical clustering and report p-values for each sub-tree

http://stat.sys.i.kyoto-u.ac.jp/prog/pvclust/

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Nicolas Rosewick9.2k

PVclust is indeed great, and it bootstraps the clustering.

ADD REPLYlink written 3.1 years ago by Kevin Blighe67k

Actually you can get p-values for the number of principal components. There has been some recent work on this. Unfortunately I have not seen it implemented in any PCA implementations that I use :( https://projecteuclid.org/download/pdfview_1/euclid.aos/1513328584

ADD REPLYlink written 2.0 years ago by John St. John1.2k
8
gravatar for Kevin Blighe
3.1 years ago by
Kevin Blighe67k
Republic of Ireland
Kevin Blighe67k wrote:

What Nicolas says is true; however, there are indirect ways to derive a P value based on your PCA results.

Buid a regression model

You can indirectly derive P values in your situation by building a binary logistic regression model using the PC1 (or PC2, PC3, ..., PCX) values to predict case/control status. From this, you should see a strong P value based on what you say.

summary(glm(CaseControl ~ PC1))

Correlation

For other type of variables, like continuous variables, you can simply run a correlation test between the continuous variable and the PC1 values, and derive a P value from this too (in R, use cor.test())

ADD COMMENTlink modified 2.0 years ago • written 3.1 years ago by Kevin Blighe67k
2

That's a good point !

ADD REPLYlink written 3.1 years ago by Nicolas Rosewick9.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2194 users visited in the last hour