Question: ANOVA or Linear regression for RNAseq data analysis with 3 factors
0
gravatar for alfonseca
11 weeks ago by
alfonseca0
alfonseca0 wrote:

Hi, I have an statistics doubt that probably is basic but I'm not clear about that.

I'm trying to find genes with differential expression, which RNA levels depend on 3 experimental factors (factor A and B with 2 levels each and a factor C with 3 levels). We've done a 3-way ANOVA from the log2(normalized counts) from different libraries and also a linear regression to look for genes with a p<0.05 for the intersection of the 3 factors.

I'm not sure if it is correct to do a linear regression or an ANOVA, considering my experimental design. I've observed that the number of genes with a significant p-value for the triple intersection is very lower when we apply the linear regression.

Does anyone can help me if it is a correct statistical analysis? I really don't understand very well the difference between ANOVA and linear regression, but I've seen that linear regression is always used in 2-way ANOVA for RNA seq data.

Any help is welcome.

Thanks in advance.

rna-seq • 264 views
ADD COMMENTlink modified 5 weeks ago by Biostar ♦♦ 20 • written 11 weeks ago by alfonseca0
2
gravatar for Benn
11 weeks ago by
Benn6.6k
Netherlands
Benn6.6k wrote:

ANOVA and linear regression are basically the same. However, for transcriptomics data it is better to use the linear models in special software, such as DEseq2, edgeR or limma. In these programs you can use the same linear models as you do now with your ANOVA, but they use statistics better suitable for RNA-seq data. I often use limma voom or limma trend for RNA-seq data, it makes use of moderated t-statistic. Read more about it in their manuals.

ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by Benn6.6k

Indeed, please use one of the dedicated programs for this. Both EdgeR and DESeq2 fit a negative binomial model to your counts and derive p-values from this model. In doing this, they take dispersion into account. Limma does it differently, as elaborated by b.nota.

ADD REPLYlink written 11 weeks ago by Kevin Blighe41k

And all of them supports "ANOVA-Style" tests - check the vignettes for the different tools.

ADD REPLYlink written 10 weeks ago by kristoffer.vittingseerup1.7k

Hi Kevin,

Can we check if the main effects or interaction effects are significant in DESEQ2? I want to see if the interaction effect is significant but I couldn't see the p-value of interaction and main effects separately in DESEQ2? I might be missing something. Do you have any insights on this?

Thanks

ADD REPLYlink written 4 weeks ago by rthapa0

There is information on this in the results() function manual entry, i.e., information on how to get the p-values for the interaction term. Simply type ?results at the command prompt and scroll to the bottom where examples are provided.

ADD REPLYlink written 4 weeks ago by Kevin Blighe41k

Hi Kevin,

Thanks you for the response. When I use results(), I get something like this;

results(DESeq.A)
log2 fold change (MLE): GenotypeDarij.Trttreatment 
LRT p-value: '~ Genotype + Trt + Genotype:Trt' vs '~ Genotype + Trt' 
DataFrame with 35225 rows and 6 columns

I think the output p-value column is the one showing DEGs because of the interaction. Does it mean that if I get many DEGs at p.adj < 0.05 from the interaction effect, I should consider the DEGs from interaction only and not consider the main effects?

ADD REPLYlink modified 4 weeks ago by Kevin Blighe41k • written 4 weeks ago by rthapa0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 799 users visited in the last hour