ANOVA or Linear regression for RNAseq data analysis with 3 factors
1
0
Entering edit mode
3.0 years ago
alfonseca • 0

Hi, I have an statistics doubt that probably is basic but I'm not clear about that.

I'm trying to find genes with differential expression, which RNA levels depend on 3 experimental factors (factor A and B with 2 levels each and a factor C with 3 levels). We've done a 3-way ANOVA from the log2(normalized counts) from different libraries and also a linear regression to look for genes with a p<0.05 for the intersection of the 3 factors.

I'm not sure if it is correct to do a linear regression or an ANOVA, considering my experimental design. I've observed that the number of genes with a significant p-value for the triple intersection is very lower when we apply the linear regression.

Does anyone can help me if it is a correct statistical analysis? I really don't understand very well the difference between ANOVA and linear regression, but I've seen that linear regression is always used in 2-way ANOVA for RNA seq data.

Any help is welcome.

RNA-Seq • 2.5k views
2
Entering edit mode
3.0 years ago
Benn 8.2k

ANOVA and linear regression are basically the same. However, for transcriptomics data it is better to use the linear models in special software, such as DEseq2, edgeR or limma. In these programs you can use the same linear models as you do now with your ANOVA, but they use statistics better suitable for RNA-seq data. I often use limma voom or limma trend for RNA-seq data, it makes use of moderated t-statistic. Read more about it in their manuals.

0
Entering edit mode

Indeed, please use one of the dedicated programs for this. Both EdgeR and DESeq2 fit a negative binomial model to your counts and derive p-values from this model. In doing this, they take dispersion into account. Limma does it differently, as elaborated by b.nota.

0
Entering edit mode

And all of them supports "ANOVA-Style" tests - check the vignettes for the different tools.

0
Entering edit mode

Hi Kevin,

Can we check if the main effects or interaction effects are significant in DESEQ2? I want to see if the interaction effect is significant but I couldn't see the p-value of interaction and main effects separately in DESEQ2? I might be missing something. Do you have any insights on this?

Thanks

0
Entering edit mode

There is information on this in the results() function manual entry, i.e., information on how to get the p-values for the interaction term. Simply type ?results at the command prompt and scroll to the bottom where examples are provided.

0
Entering edit mode

Hi Kevin,

Thanks you for the response. When I use results(), I get something like this;

results(DESeq.A)
log2 fold change (MLE): GenotypeDarij.Trttreatment
LRT p-value: '~ Genotype + Trt + Genotype:Trt' vs '~ Genotype + Trt'
DataFrame with 35225 rows and 6 columns


I think the output p-value column is the one showing DEGs because of the interaction. Does it mean that if I get many DEGs at p.adj < 0.05 from the interaction effect, I should consider the DEGs from interaction only and not consider the main effects?