I'm looking for resources (reviews, papers, tutorials) on using machine learning approaches that discriminate between phenotypic or clinical characteristics using omics data as input.

Simplest case scenario would be to build a classifier on gene expression profiles of patients with or without a particular phenotype or clinical characteristic. Similar to a standard differential expression analysis workflow, one could then explore the most discriminating features/genes from a functional perspective.

I give some related answers here:

I even go over how one can predict ethnicity using regression and PCA here: A: How to predict individual ethnicity information by using hapmap data


Whilst 'machine learning' may sound cool, the algorithms employed in the realm of machine learning do not outperform standard and well-curated regression analysis for the purposes of building classifiers.

In saying this, some of the most interesting work that I've seen in machine learning was performed in building classifiers of pathogenicity, i.e., for predicting the deleteriousness of genetic variants. For example:

  • CADD, model training using SVM with linear kernel
  • DANN, implements a deep neural network with “an input layer, a sigmoid function output layer, and three 1000-node hidden layers with hyperbolic tangent activation function
  • FATHMM-MKL, model training using SVM involving base kernels and then composite kernels


