PCA to find clusters in gene expression data
1
0
Entering edit mode
23 months ago
Hashirama ▴ 20

Dear community,

I finally got some Gene Expression Data on some patients with cancer vs. healthy patients as control. I want to investigate if the expression data form distinct clusters. That's why I performed PCA for dimensionality reduction and plotted it with sns.scatterplot: enter image description here

You can see that patients with cancers form a cluster (blue) and the healthy patients form another (orange) cluster. But:

1.) Is it valid, to perform PCA to identify cluster or do I need to do other clustering methods, like t-SNE plots or k-means clustering? 2.) How can I show that the clusters are significantly different? Can I calculate p-values? Is it also possible to plot confidence ellipses?

I would be glad for every help!

expression clustering PCA • 771 views
ADD COMMENT
0
Entering edit mode

This is a case where you don’t need a statistical test to find out whether the clusters are distinct.

ADD REPLY
2
Entering edit mode
23 months ago
ATpoint 81k

All I see is two distinct populations and one outlier. Remove the outlier sample and compare orange to blue. Does that make sense?

ADD COMMENT
0
Entering edit mode

Thanks ATpoint for your fast and helpfull answer! But what do you actually mean by "comparing"? What are the next steps?

ADD REPLY
0
Entering edit mode

I guess you want sifferential genes? Check tools wuch as limma-voom.

ADD REPLY

Login before adding your answer.

Traffic: 2029 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6