Hello,
I am currently analyzing bulk RNAseq data and have clustered my patients into 3 different clusters based on how similar their transcriptomic profile is. For all my samples, I have many different phenotypic labels.
My goal now is to check, if one of the identified clusters is enriched for a certain phenotype (for example being healthy).
My initial idea was to do a simple Fisher test.
As a very concrete example imagine the following scenario:
I have identified 4 different clusters with different numbers of samples in each:
| Cluster | Number of samples |
|---|---|
| 1 | 41 |
| 2 | 32 |
| 3 | 29 |
| 4 | 26 |
I am interested if Cluster 1 is enriched for healthy samples. I checked and 13 of the 42 samples in cluster 1 are healthy patients, the rest (28) are unhealthy. In the 3 other clusters combined, there are 10 healthy samples and 77 unhealthy samples. Consequently, if I understand everything correctly the contingency table for my fisher test should look something like this:
| 13 | 28 |
| 10 | 77 |
If I want to test for enrichment, I simply call fisher.test(contingency_table, alternative="greater"). On the other hand, if I want to test for depletion, I call alternative="less".
I would very much appreciate it, if someone could confirm if this is indeed the way to go, or if there are more sophisticated and suitable approaches.