I have multiple datasets with bulk RNA-seq and the metadata for response / no response to immune checkpoint blockade therapy.
I want to use Tumor Immune Dysfunction and Exclusion, TIDE, in order to use the transcriptomic data, to predict response.
In order for TIDE to work, the data must be normalized. My data is already TPM normalized. I tried running that but it needs some further scaling. According to the tutorial video I need to do log2(TPM + 1), and then I need to "subtract average value across all patients".
Basically this is what I did in R:
tpm_log = log2(tpm + 1)
mean_across_samples = rowMeans(tpm_log)
tpm_final = sweep(tpm_log, 1, mean_across_samples)
The problem is, many of the prediction made by TIDE are wrong. Is the normalization step ok?
Additionally, I am facing another issue. I am uncertain about how to execute the algorithm because my data comprises four distinct types of cancer. Furthermore, some samples have undergone prior immunotherapy treatments, while others have not. Should I process each data set independently and subsequently consolidate the results into a single data frame? From a statistical standpoint, what would be the most appropriate approach?
Thank you