We recently ran a RNA-seq experiment that I'm hoping to get some input on the analysis of. Im wondering in general the effect that each of the crispr editing reagents has on hematopoietic stem cells; not the effect the editing a specific gene. We electroporated each of the reagents separately or in combination to assess the effects.
The sample groups I had were Mock, RNP only (complexed Cas9 protein + gRNA electroplated into cells), ssODN electroporation, AAV only, RNP + ssODN, and RNP + AAV (with 3 biological replicates for each).
Someone mentioned to me that comparing this many samples together for differential analysis might give inaccurate results and that I should compare each of the samples separately. I previously created a vector with each of these 6 conditions together and input "design = ~condition" into DESeq2. I set the mock as the reference level so each of the samples would be assessed for differential expression against the mock treatment. Does this seem like a valid design strategy?
Also, the final two conditions (RNP+ssODN or RNP+AAV) are simply combinations of two previous conditions. Should I attempt to capture this relationship in the design formula? Does anyone have an idea for how to go about this?
I'm interested in essentially all differences between samples, including AAV vs ssODN, AAV vs RNP, RNP+ssODN vs ssODN or RNP alone and AAV+RNP vs RNP, AAV+RNP vs AAV, AAV+RNP vs ssODN, AAV+RNP vs RNP+ssODN. Is there a more applicable way to look at these differences than a simple differential expression analysis of each of the conditions vs mock? Is there a way to construct the heat map so its comparing between all conditions simultaneously rather than illustrating the fold change vs mock?
~ conditionwith the conditions as a factor is a perfectly valid design. Your two combo conditions are not "simply combinations of two previous conditions" within the statistical design: that is, you can't just use the "baseline" "RNP-vs-baseline" and "ssODN-vs-baseline" coefficients to determine expression in "RNP+ssODN" - if you do this you'll miss out on synergistic/interaction effects of the two compounds (and fitting the coefficients will be biased); you should include an interaction term if you don't want to miss these effects (this is implicitly encoded in the
~ conditiondesign). If you're interested in global inter-group variability, you should use an F-test over all possibly contrasts (use test = "LRT" as described here: https://support.bioconductor.org/p/73172/#78824).
I was thinking that the design was valid as well, this person just made me have an oh crap moment where i questioned everything. Thanks for the input. And i perhaps didn't choose my words carefully enough, but yes theres definitely different effects in the combination than just the sum of the two parts. Thanks for the clarification on the interaction term part, I'm not incredibly well versed in stats and didn't guess that point beforehand but that's really useful to know.
I've looked over a few things about the F-test and this seems like exactly what I was looking for. Its a bit over my head statistics wise though. Could I provide the exact same design formula? Thanks again for the great answers!
If you've any stats questions, feel free to ask on here - it's basically the only thing I can bring to biostars. There shouldn't be any reason to change the design.
I would first of all make a PCA plot to see how samples cluster. This will give an idea how large the effects of the single treatments are.
Thanks for the response. The single treatments clustered to diagonal corners of the PCA plot, not really sure what information to take from that though. Ill attach a link to the PCA plot. Overall, the clustering looks really nice and generally what I'd expect.