With a hope not to complicate this post beyond comprehensibility, I would like to seek help on two related operations regarding an RNASeq experiment that I recently performed.
I have a set of 25 samples (leukemia) that were sequenced using whole RNA kit on illumina platform. I have analyzed it using new tuxedo pipeline and arrived at Gene-level TPM values for each sample. One important matter in this experiment is that we don't have a Control (because the cytogenetics of controls are very different from our samples). Now I need to classify the samples into similar groups that tests our hypothesis.
Q1. How do I normalize the Gene-level TPM values across samples as those are my signals for classification and I want them on same platform ? I looked up txImport (scaled TPM) and TMM method and both needs control which is THE problem....one solution I came up with is use inter-sample variation (stn dev of expression), more the variation, more important that gene. [Please express your opinion on this]
Q2. Now my samples are close to each other when it comes to clinical manifestations so I am in need of a sensitive method to classify them. I intend to do so using most variably expressing genes. I tried the pheatmap program to perform unsupervised hierarchical clustering that didn't help much and also tried PCA using transcript level expression and would like to compare with some other method.
PS: I combined two different track of questions here for the sake of continuity in the actual idea.
Thanks in advance