Ideas to determine cutoff points for cancer classification
1
0
Entering edit mode
9.4 years ago
Avro ▴ 160

Hi Everyone,

I am using a 36-gene signature to classify human samples into a specific category. In the signature, all genes are elevated. I have 997 human samples and would like to apply the gene signature. All genes are weighted equally. I was thinking of ranking the samples (by adding all the gene values together), so that I would get one value that could be easily analyzed. The problem is: how could I establish a cut-off point for classification (i.e. after a certain rank, the human sample is not classified as the type of interest)? I was wondering if someone could please suggest ideas? Bootstrap resampling? Percentile analysis? Machine Learning? Or any paper doing something similar...

Thank you very much for your time!

Machine-Learning Cutoff Statistics • 2.3k views
ADD COMMENT
1
Entering edit mode
9.4 years ago
Ahill ★ 1.9k

I'd recommend starting with a well-established classification method, as opposed to creating a new one. Examples would include kNN, linear discriminant (LDA), or SVMs. All of these (and many others) have R implementations and can work very well for cases like yours.

If you do use a method like the one you describe (adding expression levels), that gives a continuous valued score, and assuming you have two classes, you'll want to choose a cutoff value for your score that maximizes a measure like accuracy, positive predictive value, or negative predictive value in a training set. You'll need to decide what the right metric is based on your needs - the choice is up to you. For example, if you decided overall accuracy was most important, you'd calculate the accuracy of prediction in your training set for every possible cutoff, and pick the cutoff that gave the highest accuracy. You'd then use that cutoff value when you apply your classifier to your test dataset. The concept to understand is a Receiver Operating Curve - see this for more details. See R packages like ROCR for ways to work with ROC curves.

ADD COMMENT
0
Entering edit mode

Thank you very much for your input!

ADD REPLY

Login before adding your answer.

Traffic: 2018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6