Entering edit mode
5.5 years ago
mas
▴
10
I am interested in comparing the Consensus Molecular Subtypes (CMS) labels from the random forest and single sample predictor methods from the package CMSclassifier. I have RNASeq data as raw counts as outputed by HTSeq. The CMSclassfier::classifyCMS.RF requires as input "log2_scaled Gene Expression Profiles (GEP) data values". Is it sufficient to log2 the raw counts from HTSeq or would it be more appropriate to also quantile-normalize the log2 values as in the CMScaller package? Or is there a more suitable normalization that you would recommend in these settings?
Thanks!
I would go for either
rlog
orvst
transformation, as recommended for downstream analysis in the DESeq2 manual for classification/clustering and machine learning applications.