Ranking genes based on importance rank from Random forest classifying model
1
0
Entering edit mode
2.4 years ago
pbigbig ▴ 250

Hi everyone,

I have gene expression data from 2 cohorts of Case and Control, The number of control is much more than Case (4 times more) I would like to run Random forest to select genes (features) that can strongly classify case vs control.

My plan is that, due to the abundance of control samples, I intend to run n times random sampling of Control cohort (Case cohort is kept the same), and obtain n lists of feature importance. The sum rank of those features can be used as a conclusive result.

Is this approach feasible and is there any previously published study that did the same? I am very new to machine learning, so detail explanation or suggestions are greatly welcomed.

Thank you very much.

selection random analysis feature forest transcriptome • 813 views
ADD COMMENT
1
Entering edit mode
2.4 years ago

Approaches similar to the one you describe have been used before. For example in the article:

Feng, Z., Qu, J., Liu, X. et al. Integrated bioinformatics analysis of differentially expressed genes and immune cell infiltration characteristics in Esophageal Squamous cell carcinoma. Sci Rep 11, 16696 (2021)

authors employed so called robust rank aggregation algorithm. If you use R then there is a ready to use implementation of the algorithm, called RobustRankAggreg

ADD COMMENT
0
Entering edit mode

Thank you very much, I will check their method

ADD REPLY

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6