pvalues as features in SVM classifier
Entering edit mode
5.5 years ago

I am trying to train a binary SVM classifier to identify a disease based on gene expression. So I have two classes: disease and heathly and their corresponding differential gene expression with respect to a control. For now, I considered each differential gene expression as feature of the SVM.

The gene expression is associated with a pvalue and some of them aren´t significant. None of the gene expression is consistently significant across all experiments, which means I can´t take out the non significant genes.

What can I do to take into account the differential expression? Would it make sense to add pvalues as features? Also, I thought about giving gene expression an outlier value.

RNA-Seq SVM classifier features • 1.2k views
Entering edit mode

In general using p-values as features is a bad idea because p-values are not a measure of the strength of an effect, only how (un)likely an effect is given some assumptions. Use the gene expression values as features. If there's some clear difference between disease and control, you should be able to pick it up using the right kernel. If you care about finding which genes discriminate between disease and control, you should also look into feature selection methods.


Login before adding your answer.

Traffic: 725 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6