I want to normalize the gene counts from GTEX using variance stabilizing transformation (VST) but I'm confused about which variables I should include in the "design" when creating DESeqDataSet. For the moment I'm doing the following:
dds <- DESeqDataSetFromMatrix(countData = gtex, colData = sampledata, design = ~ tissue) #generate the deseq data set dds <- dds[ rowSums(counts(dds)) > 1, ] #remove genes with zero counts vsd <- vst(dds, blind = FALSE) #normalization considering tissue
However, this just considers the different tissues during the normalization. My question is should I do it like this and include all the tissues or do it for each tissue and use something like ~ 1? should I include other variables like the experimental batch or Post-mortem interval (PMI)?