If you pick the genes/features using your full set of samples, you won't independent training / validation datasets (which I would argue is why you would test predictability with a machine learning method, versus a statistical test in a smaller set of samples).
If you are using another dataset for your validation, that might be OK. I would typically use some sort of normalized expression (such as Count-Per-Million or Read-Per-Kilobase-per-Million), but your features have be changed with a sufficiently strong difference to be more clear than other factors (such as the library prepration method, unrelated biological differences in the samples, etc.). In other words, you have to be picking of differences that are greater than your typical variable due to confounding factors.
Even though it is kind of something that I had to have greater appreciation for after publication, I think things like Leave-One-Out-Cross-Validation are actually not that great because you either i) violate the independence of the validation with upstream feature selection and/or ii) you define a different model for each sample (meaning you don't have one model that you can test in new samples).
So, my recommendation would be either i) split your data into thirds, 1/3 training and two separate 1/3 validation datasets or ii) perform analysis similar to what you have described and use another large cohort for validation.