normalization to "housekeeping" genes of RNAseq data vs qPCR result
Entering edit mode
2.1 years ago
cwwong13 ▴ 20

It is relatively common to use "housekeeping" genes when performing qPCR experiments. I would like to know how common (or uncommon) that a gene expression found in RNAseq can be replicated with qPCR.

Indeed, I usually get fairly comparable results between these two experimental techniques. However, I shift to study metabolism-related pathways recently and found that many of the transcriptional changes are not consistent between RNAseq and qPCR. One possibility is the metabolism shift will induce a large array of gene changes, that might also include the "housekeeping" gene I chose (I usually use actin or 36b4).

an example is: we found that from RNAseq, the actin level is downregulated after the treatment, thus, indeed make using actin as a "housekeeping" gene invalid. On the other hand, I wonder if that down-regulation of actin could result from generally up-regulation of all other genes? Given that many RNAseq will still be normalized to the total library size, thus, I am not sure the analysis pipeline, such as DESeq2, can deal with this.

I wonder are there any means to correct this, or at least diagnosis on the data structure/ quality so that I can know I should interpret the RNAseq/ qPCR result with cautions?


qPCR RNAseq • 2.9k views
Entering edit mode

I do not generally buy that qPCR is a good verification for RNA-seq. For a really meaningful and solid qPCR comparison, assuming that the RNA-seq was conducted and analysed properly in terms of replication and statistics, should imho probably address these points:

  • definition of housekeeping genes: While there are a couple of established house keeping genes one is essentially setting the baseline of an entire experiment to (often) a single gene. One should rather use a panel of housekeeping genes and then use the something like the median of the housekeeping panel for normalization. RNA-seq commonly uses approaches that make use of all genes while trying to find a baseline that centers most genes at a fold change of somewhat zero (RLE DESeq2, TMM edgeR for example) or at least tries to find a median ratio that properly captures the size relationship between samples.

  • primer efficiency and uniqueness: primers in PCR are usually between 18-25bp. That makes them far less unique than e.g. 2x75 or 2x150bp reads. One must therefore (for a good qPCR) ensure that primers have no detectable off-targets and have decent primer efficiency as one gets only one readout per gene while in RNA-seq the coverage is highly redundant and therefore (probably) more robust. Also, one would probably need multiple primer sets for the same gene to exclude amplicon-specific biases. Primer validation is tedious, and something like melting curves (at least in my head) are rather pointless as after 40 PCR cycles any tiny contamination will produce some kind of band that may introduce shoulders in these profiles. RNA-seq is highly redundant per gene in terms of coverage, so I think these kinds of biases should be better compensated here, especially with paired-end reads.

  • non-standardized statistics: while RNA-seq has a couple of very robust statistical frameworks such as DESeq2 and edgeR people often do custom (and sometimes inappropriate) statistics for qPCR in combination with few housekeeping controls and primers per gene/transcript. Tests such as the commonly used t-test require normality of data which is not necessarily the case. Non-parametric statistics such as the Wilcox-test also require more replicates than RNA-seq to even have the chance to reach significance. A 3 vs 3 comparison with a Wilcox-test will at best yield a p-value of 0.1 even if all ranks of group1 are lower than of group2, even without multiple testing correction, e.g.:

> wilcox.test(c(1,2,3), c(10,11,12))

    Wilcoxon rank sum exact test

data:  c(1, 2, 3) and c(10, 11, 12)
W = 0, p-value = 0.1
alternative hypothesis: true location shift is not equal to 0

So you would need several housekeeping genes, several sets of primers, many replicates to make a fair qPCR experiment if challenging RNA-seq. But this is obviously lots of work, anyone who got cramps in the thumb after pipetting several 96-well plates in a row knows that.

RNA-seq is often used to build a hypothesis rather than checking individual genes. I would therefore prefer to validate the generated hypothesis, e.g. via a knockdown experiment of a pathway that was hypothesized to be altered after analysing the expression data, rather than qPCR-ing some individual pathway genes. If you are interested in individual genes then do qPCR, for the global picture I would always prefer RNA-seq followed by experimental functional validation of results.

Not sure whether this contributed anything to your question, I just felt like ranting about qPCR :p

Entering edit mode

I totally agree with you on doing the qPCR validation thing, but sadly, it seems still a common practice among experimental biologists.

I am also curious that what should be a proper normalization. Do you think we should normalize to total cell number in bulk RNAseq? Do you have an idea (or suggestion on a published article that will be even better, so I can just cite it or show it to my supervisor) of how library size is correlated to the cell number?

Sorry for asking this fundamental question here. I just realized I really missed all the basics of statistics and bioinformatics even I can run/ modify the code from online tutorials.

Entering edit mode

Do you think we should normalize to total cell number in bulk RNAseq?

No, I do not see how this a) would be possible and b) would make sense. DESeq2 and edgeR work very well for this purpose. I in fact never heard of normalization to a cell number.

Do you have an idea (or suggestion on a published article that will be even better, so I can just cite it or show it to my supervisor) of how library size is correlated to the cell number?

If with library size you mean sequencing depth then there is no correlation. The more reads, the more library size, it is a technical thing.

Entering edit mode
2.1 years ago
JC 13k

There is the concept of any gene can be variable at some point, so there are no "housekeeping" genes, for the experiments you should include a few typically non-variable genes and select those that are not variable for normalization.


Login before adding your answer.

Traffic: 1159 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6