Question

Gender Bias correction with DESeq2

0

Entering edit mode

3.6 years ago

sigalottil • 0

Hi everyone, I'am dealing with a RNAseq dataset which has a very unbalanced gender distribution between the 2 classes I need to compare. In detail, "control" class has 11 male and 2 female, while "case" class has 1 male and 8 female. I am wondering if there is an adequate and simple method to mitigate in someway this unbalance while performing differential expression analysis. I am considering using the batch correction option in DESeq2: design = ~ Sex + Type, but I do not know what to expect, being the "confounder" so disproportionately distributed, and if the option is appropriate. As you can tell from the basic level of question, I am new in this field. Thank you for the help.

RNA-Seq • 1.9k views

ADD COMMENT • link updated 3.6 years ago by Kevin Blighe 87k • written 3.6 years ago by sigalottil • 0

score 4 · Answer 1 · 2020-09-02

4

Entering edit mode

3.6 years ago

Kevin Blighe 87k

Hey, there actually does not exist any specific batch correction option in DESeq2, I mean, in terms of a function. What you could do is include gender (or sex, depending on how the information is actually reported for each individual) in your design formula, in which case any test statistics that you derive for another term will be adjusted for the effect of gender. For example, if I have ~ gender + disease, when I derive test statistics for disease, these will be adjusted for the effect of gender.

Also, please do not eliminate a 'batch' effect just based on a feeling or a 'hunch'. You need evidence that a particular factor, like gender, is biasing your data in some form. This can easily be inferred via a PCA bi-plot, but look beyond PC1 and PC2 toward other PCs, too, and take into account the explained variation along each PC.

The imbalanced nature of gender adds an extra element to this. I would, out of interest, perform a stratified analysis for male and female, and then a combined analysis with gender in the design formula and gender not in the design formula.

By taking multiple lines of evidence, you'll be better equipped to address the issue.

Kevin

ADD COMMENT • link 3.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you Kevin for the detailed answer. Unfortunately, in PCA, samples separate by gender almost the same way they separate by condition.... I think this could be expected, since condition groups are very very biased towards gender... I will try and check the other analysis sugegsted, but I think that the stratified one would be tricky: I will end up in comparing 1 case vs 11 controls, for male, and 8 cases vs 2 controls for female, If i got it correcly.

ADD REPLY • link 3.6 years ago by sigalottil • 0

0

Entering edit mode

Yes, the imbalance will result in exaggerated / biased p-values and fold-change estimates, but I thought it interesting to do for your own investigating. If there is definite separation, then perhaps try to control for gender via the inclusion of gender in the design formula.

Either way, as you can see, there is no 'one size fits all solution'.

In any epidemiological study, we would report covariate-adjusted and non-adjusted test statistics, so, I see no reason why we cannot do the same for Omics-style data.

ADD REPLY • link 3.6 years ago by Kevin Blighe 87k