Hi
I have a set of 12 RNA-seq samples spread across 5 different groups (different tissues). At the moment I am focusing on making pairwise differential expression comparisons between the groups using DESeq2. I've noticed that different results are returned depending on whether I pass all the data to DESeq2 then request a contrast between two groups or instead manually select the groups I'm interested in then give just that data to DESeq2:
Metadata <- data.frame(name, count.file, group)
des2.samples <- Metadata
# Not sure if I should do this!
# des2.samples <- des2.samples[des2.samples$group %in% c("Group2", "Group3"), ]
# Create DESeqDataSet object from HTSeq-count files
des2.data <- DESeqDataSetFromHTSeqCount(des2.samples, design = ~ group,
directory = "data")
# Calculate DE and get results
des2.data <- DESeq(des2.data)
des2.res <- results(des2.data, contrast = c("group", "Group2", "Group3"))
# Order by padj
des2.res <- des2.res[order(des2.res$padj), ]
# Check out the summary
summary(des2.res)
I believe that the differences are likely due to how DESeq2 does its filtering but I'm unsure what the best approach is, particularly as one group is a clear outlier to the others and may skew the results? I'm also wondering if a similar affect would be seen with other packages (edgeR, DESeq, voom etc.) and whether they would need to be treated differently.
Thanks