I have some rna-seq data with 2 classes (cancer/normal) that I ran DESeq2 on to obtain significant DE genes. My lab is interested to see how well our 'significantly DE' genes can classify cancer/normal samples and is using the AUC score/plot to viz the performance.
Does it make sense to train the random forest with the same pre-defined list, instead of using a feature selection method? Can this artificially inflate the AUC scores if used with LOOCV?
Please let me know your thoughts, questions or concerns you may have. I am fairly untrained and want to learn as much as I can (but am under pressure to deliver with no guidance/mentorship).
Thank you for your time, J