I have a set of 12 RNA-seq samples spread across 5 different groups (different tissues). At the moment I am focusing on making pairwise differential expression comparisons between the groups using DESeq2. I've noticed that different results are returned depending on whether I pass all the data to DESeq2 then request a contrast between two groups or instead manually select the groups I'm interested in then give just that data to DESeq2:
Metadata <- data.frame(name, count.file, group) des2.samples <- Metadata # Not sure if I should do this! # des2.samples <- des2.samples[des2.samples$group %in% c("Group2", "Group3"), ] # Create DESeqDataSet object from HTSeq-count files des2.data <- DESeqDataSetFromHTSeqCount(des2.samples, design = ~ group, directory = "data") # Calculate DE and get results des2.data <- DESeq(des2.data) des2.res <- results(des2.data, contrast = c("group", "Group2", "Group3")) # Order by padj des2.res <- des2.res[order(des2.res$padj), ] # Check out the summary summary(des2.res)
I believe that the differences are likely due to how DESeq2 does its filtering but I'm unsure what the best approach is, particularly as one group is a clear outlier to the others and may skew the results? I'm also wondering if a similar affect would be seen with other packages (edgeR, DESeq, voom etc.) and whether they would need to be treated differently.