I am working on a project where we want to classify cancer patients based on a risk score. The score is based on expression of selected genes that my group has identified previously. I have mined databases and selected a few microarray datasets to go on with, the idea is to combine datasets to get a strong separation of risk groups.
In my current pipeline, I score the samples from the normalised (z-scored) intensities (after batch correction of course). The group I work with might get samples in future for which we will have to calculate the risk score.
What would be the best approach for scoring single sample/patient? I would like my pipeline to work on datasets and on isolated samples. I appreciate any suggestions you might have.
Thank you!