I have a list of RNA-Seq from cancer samples to which I analyzed and generated the RPKM and TPM (calculated from RPKM) for all of the genes. Additionally, I downloaded a large set of public cancer samples (multiple cancer types) which I want to use as a comparison data and also generated the RPKM and TPM similarly.
What I want to do is to find the public sample that matches the best to my samples (cancer type wise) by comparing RPKM/TPM profiles accross a set of genes (N) of interest (or known to be expressed for each cancer type). I read that using RPKM is a bad idea, so I switched to TPM to do this task.
For each one of my samples, I take the TPM distribution of the N genes and compare it with the TPM distribution of each of the public samples and get a p-value (e.g., KS test). But this does not seem to be the best idea.
Can anyone guide me to another type of tests that can be usefull in my case? or can one for example build a model of TPM of all cancer samples of the same type then compare my samples back each of these models?