Hi,

I have 4 groups (3 species and a hybrid between two of them). I am conducting pairwise gene expression analysis.

I could estimate dispersion per gene using all 4 groups (23 samples in total) or I could estimate per gene dispersion for each pairwise comparison separately. Looking at the plot visually, they look similar, however, in the plot of dispersion per gene as a function of normalized mean counts, visually, it looks that when I estimate dispersion using one pair at a time, there is a smaller amount of dispersion at higher gene counts.

However, this is just by looking at the plot. I was wondering if there is any proper way to see which way would give me a "better" estimation of per gene dispersion?

Thank you!

This is really helpful, thank you so much. I have done a PCA but rather just to visually see if my samples group according to the expectation from my experimental design. I am going to use the PCAtools now for more detailed analysis of variance. Just one question, how could I extract the amount of within-group variation from PCA? Should I do PCA just in one group at a time and see the amount of variance explained by PC1 and PC2, for example?

I just do that by eye tbh... If they cluster well by group and the distance between the points per group is similar then keep it simple and do not split the experiment. But I have never really been in a situation where this was criticial, so others might suggest a more quantitative and reproducible way of doing it.

I see, thank you. By the way, these are the plots of dispersion estimates as a function of normalized counts. The first is when using all groups and the second is when using a pair, that is only two groups.

All groups

Two groups