DESeq2_ How to Account for the Expression Levels of a Specific Gene?
2
0
Entering edit mode
4 weeks ago
BioNovice247 ▴ 20

Hello everyone,

I am working on RNA-seq datasets and the question I'm interested in is "Keeping the expression levels of gene X constant, which genes are differentially expressed between conditions A and B?".

I think one way to do this is by using the design ~ Expression + Condition (where Expression is the expression levels of the gene X as a continuous variable) followed by extraction of the Condition effect by lfcShrink(dds, coef = "Condition_B_vs_A", type = "apeglm"). My questions:

  1. Is this a valid approach for my purpose? if not, why? and what would be a valid way to achieve this? If yes:
  2. How should the expression levels of gene X be transformed prior to including it in the design? Should I use VST counts? Are centering and scaling required?

Thanks in advance for any help and guidance.

Differential-Expression-Analysis DESeq2 • 489 views
ADD COMMENT
2
Entering edit mode
29 days ago

See ?estimateSizeFactors, specifically the controlGenes parameter. You can normalize to whatever set of genes you want and run a standard analysis afterwards. Whether that's a good idea (pro tip, it's usually not) is a different question.

I'm be particularly wary of doing this with a single gene for a typical bulk RNA-seq experiment.

ADD COMMENT
0
Entering edit mode

Thank you very much for your responses jared.andrews07 and Thomas Neff

In my example, gene X is representative of a marker for a specific cell type (e.g., NK cells). By accounting for the expression of this gene, I aim to account for the differences in the abundance of this specific cell type so that the differential abundance of this cell type would not have an effect on the downstream observations (I'm using this in two different cases. In one case, the differential abundance of the cell type is due to batch effect, and in another, it's a biologic effect that, regardless, we don't want to affect the observations). I assume I can input this gene (or multiple validated markers of the specific cell type) as the controlGenes (right?), but would you please elaborate as to why including this gene (or maybe the estimated abundance of the cell type from a deconvolution approach) in the design does not work? I believe incorporation of continuous variables such as age to account for their effects is regularly used in the field, so what's different in this case?

Also

Whether that's a good idea (pro tip, it's usually not) is a different question.

If this is also not a good idea, then do you know of any reasonable way to handle such situations?

ADD REPLY
0
Entering edit mode

so what's different in this case

"Control" genes are still variable across samples even if relatively consistent. A 10-20% change of a very highly expressed gene may be relatively meaningless - but it still happens and anyone who's done much qRT-PCR will be well aware of this. Hence why using a panel of housekeeping genes for qRT-PCR is usually a good idea. The same is largely true in RNA-seq, and normalizing to a single gene (or a handful of them) that should be pretty identical can still have dramatic effects. The RUVseq paper does a good job of demonstrating this with spike-ins inserted into the samples at supposedly equivalent levels.

If this is also not a good idea, then do you know of any reasonable way to handle such situations?

Sort and mix your samples in equal proportions of your problematic cell type vs "others" or probably more simply on the front-end, do single cell RNA-seq. Though the analysis for that is considerably more annoying and time-consuming. Deconvolution of your bulk samples via CIBERSORTx or such would indicate how great the difference in abundance actually is, which may provide some additional avenues.

ADD REPLY
0
Entering edit mode
29 days ago
Thomas Neff ▴ 10

You can also look at normalization methods such as RUVseq, RUVIII, CBCS etc which are based on housekeeping genes expression to perform normalization.

ADD COMMENT
0
Entering edit mode

Indeed, these are other options. The RUVseq paper does a good job demonstrating the issues with normalizing to a set of supposedly static controls in the form of ERCC spike-ins.

ADD REPLY

Login before adding your answer.

Traffic: 2682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6