Question

Pairwise comparisons between multiple groups with DESeq2

0

Entering edit mode

9.1 years ago

lazappi • 0

Hi

I have a set of 12 RNA-seq samples spread across 5 different groups (different tissues). At the moment I am focusing on making pairwise differential expression comparisons between the groups using DESeq2. I've noticed that different results are returned depending on whether I pass all the data to DESeq2 then request a contrast between two groups or instead manually select the groups I'm interested in then give just that data to DESeq2:

Metadata <- data.frame(name, count.file, group)
des2.samples <- Metadata

# Not sure if I should do this!
# des2.samples <- des2.samples[des2.samples$group %in% c("Group2", "Group3"), ]

# Create DESeqDataSet object from HTSeq-count files
des2.data <- DESeqDataSetFromHTSeqCount(des2.samples, design = ~ group, 
                                        directory = "data")

# Calculate DE and get results
des2.data <- DESeq(des2.data)
des2.res <- results(des2.data, contrast = c("group", "Group2", "Group3"))

# Order by padj
des2.res <- des2.res[order(des2.res$padj), ]

# Check out the summary
summary(des2.res)

I believe that the differences are likely due to how DESeq2 does its filtering but I'm unsure what the best approach is, particularly as one group is a clear outlier to the others and may skew the results? I'm also wondering if a similar affect would be seen with other packages (edgeR, DESeq, voom etc.) and whether they would need to be treated differently.

Thanks

RNA-Seq DESeq2 differential expression groups R • 5.9k views

ADD COMMENT • link updated 9.1 years ago by umer.zeeshan.ijaz ★ 1.8k • written 9.1 years ago by lazappi • 0

score 1 · Answer 1 · 2015-04-06

It depends on you, if you want to capture overall differences (obviously skewed by one group being an outlier) then you can pool them together, otherwise you can do pairwise comparisons.

I have two scripts NB.R (based on DESeq2) and KW.R (based on Kruskal-Wallis with FDR) that you can use alternatively for finding taxa/genes with logfold changes. In KW.R I am applying log-relative normalisation first!

The scripts take a NxP dimensional count data with N being samples, and P being feature points (OTUs/genes and so on) and an Nx1 group data (as a data frame) with factor datatypes and generates a barplot for subset of these OTUs/genes that are significantly different.

You can find them here:

http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/ecological.html

Best Wishes,

Umer