What is your opinion on the following paragraph in section 6.3 of RNA-seq workflow.
(http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html) by the creators of DESeq2 package.
"The heatmap becomes more interesting if we do not look at absolute expression strength but rather at the amount by which each gene deviates in a specific sample from the gene’s average across all samples. Hence, we center each genes’ values across samples, and plot a heatmap (figure below). We provide a data.frame that instructs the pheatmap function how to label the columns."
My doubt is mainly arising when I am adding another dataset to our own dataset. We have say 55 human RNA-seq samples with SCLC and then I have another 81 samples from a different study (https://pubmed.ncbi.nlm.nih.gov/26168399/).
If I have a large matrix with with each column (j) being sample name, with first 55 columns being tempus sample name and then the remaining columns being from the other study and each row (i) is a gene TPM then do you think it makes sense to look at the:
Y_ij = TPM of gene_i_sample_j - mean(TPM of gene_i from all samples from both studies)
I am not sure whether looking at the relative expression of a gene from 2 different sample sets like above makes sense.