I would like to apply machine learning methods on RNA-seq data from the TCGA dataset for the purpose of survival time analysis. The samples have to be comparable, so I understand I should use between-sample normalization methods like DESeq2.
I would like to split my dataset to train/test subsets.
1) Is it possible to normalize the training set only using DESeq2 and later use it for normalizing the test set, so the test samples will not affect the normalization of the training set?
2) Will normalizing the training and test subsets separately result in non-comparable samples between train and test?
3) Are there other between-sample normalization methods, which are better than upper quartile normalization, for this purpose?