I have some rna-seq data with 2 classes (cancer/normal) that I ran DESeq2 on to obtain significant DE genes. My lab is interested to see how well our 'significantly DE' genes can classify cancer/normal samples and is using the AUC score/plot to viz the performance.
Does it make sense to train the random forest with the same pre-defined list, instead of using a feature selection method? Can this artificially inflate the AUC scores if used with LOOCV?
Please let me know your thoughts, questions or concerns you may have. I am fairly untrained and want to learn as much as I can (but am under pressure to deliver with no guidance/mentorship).
Thank you for your time, J
Thank you for your reply! I greatly appreciate it. I agree with you on the last point of this not being very informative... but gotta do as I'm told for now :/
One last question, for testing the classifier on an unseen dataset- can it be any tumor/normal tissue dataset? Or does it have to be from the same tissue type (parathyroid in this case).
Ideally the same tissue type; your classifier is unlikely to perform well on a different tissue type (though you can try). Generally, training on apples and testing on oranges does not yield good results.