I have the miRNA expression coming from RNA-seq. I have performed differential gene expression and machine learning methods to find a signature to differentiate between patients and controls. With the training cohort, the results look great, but with the validation cohort, the results are not very good (AUC<0.6). I saw in the PCA that the batch for the training and the batch for the validation are separated (not completely, but there is certainly a batch effect).
I have been told that when these problems arise, normally it is good to perform some kind of normalization on the samples. I have performed CPM normalization, but I don't know which other normalization methods could improve the results. I have seen that there are some programs to find endogenous controls for normalization (Normfinder, GeNorm), but I have not seen how to use them with RNA-seq data in R.
There is a current "gold standard" method for miRNA normalization?
Thank you very much, Lluc