Dispersion estimation using DESeq2
1
0
Entering edit mode
3.2 years ago
hpapoli ▴ 140

Hi,

I have 4 groups (3 species and a hybrid between two of them). I am conducting pairwise gene expression analysis.

I could estimate dispersion per gene using all 4 groups (23 samples in total) or I could estimate per gene dispersion for each pairwise comparison separately. Looking at the plot visually, they look similar, however, in the plot of dispersion per gene as a function of normalized mean counts, visually, it looks that when I estimate dispersion using one pair at a time, there is a smaller amount of dispersion at higher gene counts.

However, this is just by looking at the plot. I was wondering if there is any proper way to see which way would give me a "better" estimation of per gene dispersion?

Thank you!

RNA-Seq deseq2 • 1.4k views
ADD COMMENT
2
Entering edit mode
3.2 years ago
ATpoint 81k

People often use PCA (or a similar dimensionality reduction that compresses the information of the variable genes into lower dimensions) to explore how the samples behave and whether distances between replicates of the same group are similar across all groups. If similar that would argue for keeping all data together into a single analysis, which also makes downstream analysis somewhat more feasable as you only would have a single DESeqDataSet to deal with. If on the other hand some groups have notably larger spread that others then this might somewhat affect the results of individual comparisons where dispersion is actually lower than estimated for the total dataset. DESeq2 has a plotPCA function for this, or the Bioconductor PCAtools which is way more versatile. PCA also helps identifying potential outliers due to technical issues or batch effects that might need to be addressed. It (imho) should always be a standard step in the initial QC/data exploration.

ADD COMMENT
0
Entering edit mode

This is really helpful, thank you so much. I have done a PCA but rather just to visually see if my samples group according to the expectation from my experimental design. I am going to use the PCAtools now for more detailed analysis of variance. Just one question, how could I extract the amount of within-group variation from PCA? Should I do PCA just in one group at a time and see the amount of variance explained by PC1 and PC2, for example?

ADD REPLY
0
Entering edit mode

I just do that by eye tbh... If they cluster well by group and the distance between the points per group is similar then keep it simple and do not split the experiment. But I have never really been in a situation where this was criticial, so others might suggest a more quantitative and reproducible way of doing it.

ADD REPLY
0
Entering edit mode

I see, thank you. By the way, these are the plots of dispersion estimates as a function of normalized counts. The first is when using all groups and the second is when using a pair, that is only two groups.

All groups

Two groups

ADD REPLY

Login before adding your answer.

Traffic: 3131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6