I have a question concerning DESeq2's multi-factor design for a RIP-seq experiment. My experimental design is quite complicated, as can be seen in the following table:
Here, input means RNA after extraction from the cells and output refers to the immunoprecipitated (IP) RNA, where IP uses an antibody against the tag. Therefore, the samples containing a tagged protein should show an enrichment of RNAs compared to the input and, if the signal is real (no random binding), also compared to the tag-free samples.
My initial approach was to follow the analysis described here: https://support.bioconductor.org/p/61509/ and test for significant enrichment under 'stress' and 'no stress' separately with the following design:
design = ~condition+input_output+condition:input_output; ddsCountMatrix <- DESeqDataSetFromMatrix( colData = sample_information_stress), countData = count_table_stress, design = design); dds <- DESeq(ddsCountMatrix); reduce = ~condition+input_output; dds <- DESeq(dds, test = 'LRT', reduced = reduce); res <- results(dds,altHypothesis='greater');
The model matrix generated by this design looks like this:
(Intercept) input_outputoutput conditionDhh1 5 1 0 0 6 1 0 1 7 1 0 0 8 1 0 1 13 1 1 0 14 1 1 1 15 1 1 0 16 1 1 1 input_outputoutput:conditionDhh1 5 0 6 0 7 0 8 0 13 0 14 1 15 0 16 1
So Intercept comprises 'no tag' and 'input'. However, contrary to my assumption, the resulting list of genes is expressed higher in the control samples despite having a positive LFC (specified via altHypothesis).
My questions therefore are:
1. What am I missing in the design to find genes that are enriched in output tag vs. input tag and output no_tag?
2. Is it possible to include the stress vs. no stress comparison in the design as well? Or should I stick to identifying genes above background with the above methodology and then continue with a smaller gene list for comparing stress vs. no stress?
Any help is greatly appreciated, thanks in advance and best regards,