Multi-factor Analysis in DeSEQ2
1
0
Entering edit mode
10 weeks ago
pthom010 ▴ 10

I have a 22-day time course experiment I conducted with two plant genotypes and five time points (0, 1, 7, 14, 22 dpi). I have manged to successfully complete some basic analysis of my RNAseq data. I now wanted to do some multi factor analysis but wasn't sure where to stat. Here is the layout of my samples:

RA - resistant, 0 dpi RB - resistant, 1dpi RC - resistant, 7dpi RD - resistant, 14 dpi RE - resistant, 22 dpi

SA - susceptible, 0 dpi SB - susceptible, 1 dpi SC - susceptible, 7 dpi SD - susceptible, 14 dpi SE - susceptible, 22 dpi

Here are the analyses I have done so far:

RA-SA RB-SB RC-RC RD-SD RE-SE

RA-RB RA-RC RA-RD RA-RE

SA-SB SA-SC SA-SD SA-SE

I am uploading my data using the tximport function. I have designed my sample file like so:

sample

timepoint treatment genotype
S1A_rep1         0        SA        S
S1A_rep2         0        SA        S
S1A_rep3         0        SA        S
S1B_rep1         1        SB        S
S1B_rep2         1        SB        S
S1B_rep3         1        SB        S
S1C_rep1         7        SC        S
S1C_rep2         7        SC        S
S1C_rep3         7        SC        S
S1D_rep1        14        SD        S
S1D_rep2        14        SD        S
S1D_rep3        14        SD        S
S1E_rep1        22        SE        S
S1E_rep2        22        SE        S
S1E_rep3        22        SE        S
R1A_rep1         0        RA        R
R1A_rep2         0        RA        R
R1A_rep3         0        RA        R
R1B_rep1         1        RB        R
R1B_rep2         1        RB        R
R1B_rep3         1        RB        R
R1C_rep1         7        RC        R
R1C_rep2         7        RC        R
R1C_rep3         7        RC        R
R1D_rep1        14        RD        R
R1D_rep2        14        RD        R
R1D_rep3        14        RD        R
R1E_rep1        22        RE        R
R1E_rep2        22        RE        R
R1E_rep3        22        RE        R


The specific four comparisons I want to make are the following:

(RB - SB) - (RA - SA) (compare RB and SB, then compare that to RA and SA, compared)
(RE + SE) - (RA + SA)
(RE + RD) - (RC + RB)
[(RE + RD) - (RC + RB)] - [(SE + SD) - (SC - SB)] (compare the early vs late response of the R plants to that of the susceptible)


Do I need to change my sample object to facilitate these comparisons or is there another piece of code I should be using?

rnaseq r DESeq2 • 365 views
4
Entering edit mode
10 weeks ago

You can perform complex contrasts in DESeq2 using the contrasts arguement to results. The contrasts argument can accept a list with two entries - one for the coefficients to add, and one for the coefficients to subtract. So, for example, in your first desired contrast,

(RB - SB) - (RA - SA) == RB + SA - SB - RA


So RB and SA are the "add" coefficients and SB and RA are the "subtract" ones. I don't quite know how your design is configured, so I don't know how the coefficients will be named.

The relevant section of the DESeq2 manual is here, and it recommends this tutorial on specifying complex contrasts.

A couple of things to watch out for here:

1. You are doing a lot of contrasts, but you are only calculating the FDR over each set individually. There is an argument that you should be calculating your FDR across all the comparisons you make together (including the "easy" ones you mention at the top). This is going to absoluately murder your power.
2. This is a complex design, with lots of coefficients to estimate, but only three replicates in each cell. This again is a power drain. You might like to try something like stageR. This filters for main effects first, and only tests interactions for genes where there is a significant main effect. This has the effect of preserving your power by not carrying out tests that are unlikely to be of interest anyway.
0
Entering edit mode

Very nice answer, I just wanted to add that as far as I know, there is no clear consensus on [1] (calculating FDR across multiple contrasts). While making multiple contrasts increases the number of false positives, it does not necessarily increases the rate of false positive (FDR). There was an interesting discussion a few weeks ago on biostars, pointing to a few posts from Gordon Smyth (author of Limma/edgeR). He seems to stands for applying FDR to each contrast separately : What are the necessary and sufficient conditions for applying a multiple hypothesis correction? .

0
Entering edit mode

(RB - SB) - (RA - SA) == RB + SA - SB - RA


Why would the RB and SA be add coefficients if I wanted to compare each genotype's 1dpi timepoint to its 0dpi timepoint and then compare those comparisons to each other?

0
Entering edit mode

It's just how the math works out, there's a reason we only think in terms of the left side of that equation. Your question is asking for the interaction between genotype and timepoint, which is the difference of differences on the left.

0
Entering edit mode

OK, so if I were to do the (RB - SB) - (RA - SA), would I use the following code:

adds = RB, SA
minuses = SA, RA

0
Entering edit mode

Its just the result of mutlipling out the brackets:

(RB - SB) - (RA - SA) == (+1) * (RB - SB) + (-1) * (RA - SA)
== 1*RB - 1*SB - 1*RA - 1 * (-SA)
== RB - SB - RA + SA


The code you would you would be something like:

adds = c("treatmentRB", "treatmentSA")
subtracts = c("treatmentSB", "treatment RA")


Although the precise things that would go in those first two lines would depend on what the coefficients are called. You could get the names to use by using resultsNames(deseqData)

0
Entering edit mode

Got It! I figured out how to do the specific contrasts I wanted. I just produced the model matrix of my analysis with model.matrix(design(dds), colData(dds)) and then labeled new objects based on how I labeled my treatments (RA = colMeans(mod.mat[deseqData\$treatment == "RA", ]) , then ran the contrasts I had listed above. Thanks.