I want to analyze data from TCGA for all the cancer and normal tissue types to be scaled and normalized at once. After that, I tried to do PCA to visualize different variables and see if there is a batch effect and how different tissues and diagnoses would cluster, but the amount of genes and studies are too much and it takes too much time to investigate how the PCA would look like with a single study variable, and I have about 20 potiential covariate and batch factor!
Should I take the PCA seriously as recommened by most workflows? Can I filter only the genes of interset and ignore all the of the variablity of other genes?
I am asking about subseting genes cause I've read in another blog that It is recommened to keep all of the genes in the normalization. I am following emperical bayes normalization using limma-voom workflow
Thank you so much, that was insightfull!