I have a data set of significant differentially expressed genes (1028) from my DESeq2 analysis. I also have 5 measurements of physiology for my organism of interest. I have a total of 35 samples.
I ran a random forest analysis using rfsrc() from package, randomForestSRC. My y/response variables are the phys measurements (3 numeric, 2 categorical) whereas my x-variables are the genes (1028 numeric). I have an output but I am struggling in how to interpret my train dataset output and my test dataset output as well as how to visualize a tree from the forest.
I tried ggRandomForest but it appears that this is not set up for the multivariate (regr+) of randomForestSRC.
Basically, I want to know:
1) How to know if my model is correct?
2) How to determine which genes were the best predictors for the x-variables.
3) How to visualize the decision tree of the forest in order to see how the terminal nodes were decided.
I've reviewed Udaya Kogalur & Hemant Ishwaran's webpage (https://kogalur.github.io/randomForestSRC/theory.html) as well as other websites/forums but am still having trouble understanding how to proceed.
My summaries for the training set (80% of dataset) and test set (20% of dataset) are below:
> print(RFmodel) Sample size: 28 Number of trees: 1000 Forest terminal node size: 3 Average no. of terminal nodes: 5.68 No. of variables tried at each split: 33 Total no. of variables: 1028 Total no. of responses: 5 User has requested response: Biomass.z Resampling used to grow trees: swor Resample size used to grow trees: 18 Analysis: mRF-RC Family: mix+ Splitting rule: mv.mix *random* Number of random split points: 10 % variance explained: -0.23 Error rate: 0.71 > print(RFpred) Sample size of test (predict) data: 7 Number of grow trees: 1000 Average no. of grow terminal nodes: 5.68 Total no. of grow variables: 1028 Total no. of grow responses: 5 User has requested response: Biomass.z Resampling used to grow trees: swor Resample size used to grow trees: 4 Analysis: mRF-RC Family: mix+ % variance explained: 15.77 Test set error rate: 2.84