Gene Expression Pam Classification Reproducibility Question
0
0
Entering edit mode
10.1 years ago
Mattias Aine ▴ 620

I'm working on recreating the classification of a tumor set using pam in R.

I have a data set obtained from the authors of a recent study.

They perform consensus clustering (ConsensusClusterPlus-package) to derive stable subtypes and use that classification for deriving a classification gene signature using pam.

Using CCP with parameters from the paper I can get a 2-group split with the right number of tumors in both clusters (no RNG-seed was reported in the paper though).

When I use that cluster-split for training with the threshold-parameter from the paper, I get back the correct gene signature with all parameters exactly equal to those published in the supplement of the paper in question.

Using the pamr.predict-function on the data I can also get cluster designations for each tumor sample from pam.

However the paper shows a cross-table of the CCP-cluster designations and pam-designations, and these do not agree with what I see. The CCP-samples are seemingly right, but the pam-classification is off by 4 samples.

Is pam not a completely deterministic classifier for a given threshold or is there something I have missed?

Are there parameters downstream of fixing the cutoff-parameter (number of discriminating genes) that influence the cluster designations?

It is unlikely that another 2-group CCS-soultion for training would be the right answer as that would change the pam derived gene-signature. To be sure I ran CCP 500-times with different RNG-seeds to see how many alternate solutions with the "right" number of tumors per cluster were out there and the answer was 1 other (6/500 runs). That one did not reproduce the right gene-signature in pamr.

I also used the centroids of the pam-genes from the full data and tried nearest-neighbor classification using Person, Spearman and Euclidean distance, but no method reproduces the publication crosstable.

It is important for me that I can reproduce the exact clustering results from the paper in question which is why I obtained the data from the authors, they didn't however include any clustering-calls for individual samples.

I guess the next step is to bug the authors a bit more, but I wanted to check first if I have missed something very obvious.

r cancer classification gene-expression • 2.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 2193 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6