I want to run a correlation analysis between matched TCGA mRNA and miRNA samples from multiple projects. Specifically, I want to run the analysis once for every tumor type (e.g. TCGA LUAD) independent of others and then run the analysis once again across tumor types. I was wondering if it is better to first pool all the samples from the various tumor types together, normalize the data using counts from all the samples, and then subset this large count matrix (which includes samples from all the tumor types) to separate matrices that only include samples from a single tumor type and run the analysis on them or is it better to first construct tumor-specific matrices and normalize each matrix separately.
In short, is it more appropriate to normalize the sample using the complete set of data (which includes non-homogenous data from different cancer types) and then run the analysis on different portions (tumor types) or is it better if each portion is normalized independent of the others? I am using VST normalization btw.
Thanks in advance for your time