Question

qRT-PCR data analysis steps and workflow

2

Entering edit mode

5.2 years ago

mohammedtoufiq91 ▴ 250

Hi,

I am currently working with the Fluidigm qRT-PCR data. There are 3 plates with total of 288 genes combined into one file (264 target genes + 8 * 3 (triplicate)= 24 Reference genes) with each plate consists of 96 genes (88 Target genes + 8 Reference genes) in one file. In summary, I have approximately 264 target genes and 8 * 3 = 24 reference genes in around 45 samples. Each of the samples are technically replicates.

I want to know if the methodology followed below is correct during the data analysis?

A. Handling multiple reference/housekeeping genes

Since, I have 8 reference genes in triplicate in each sample in a combined file, I created a data file with these reference genes across all samples,

Average of on-chip reference genes at the each sample level (Arithmetic mean of 8 * 3 Reference genes leading to 8 * 1 Reference gene for each sample)
Identify most stable reference genes for instance (top 4 ) using appropriate In-Silico approaches (geNorm, NormFinder, Bestkeeper etc) based on the M-value
Create a Psuedogene by calculating the geometric mean of top 4 stable reference gene across samples

B. After this, 1. Create a new data file, average (arithmetic mean) technical replicates across all samples for remaining genes i.e with all the 264 target genes + psuedogene (geometric mean of top 4 reference gene)

For instance,

Detector Target Gene 1 Target Gene 2 . . . PsuedoGene

Calculate △Ct (Difference between the Target gene and reference gene (i.e psuedogene))
Calculate △△Ct (Difference between the sample and average of control samples)
Calculate 2^(-△△Ct) to evaluate fold gene expression levels

Please let me know if the above analysis methodology looks fine?

qRT-PCR Reference genes Normalization DeltaCt FC • 2.8k views

ADD COMMENT • link 5.2 years ago by mohammedtoufiq91 ▴ 250

2

Entering edit mode

It's unclear what exactly the "83 (triplicate)" is actually referring to, unless you have different but slightly overlapping reference gene sets by sample. Likewise, I'm not sure how taking the mean of of the reference genes changes their number (from 83 to 81). Aside from that the methodology seems fine (namely, creating a stable reference and then using the standard delta-delta Ct method).

ADD REPLY • link 5.2 years ago by Devon Ryan 104k

1

Entering edit mode

Hi Devon,

Thank you. I have reposted the question once again. Earlier, there was a formatting issue with the mathematical signs. Hence, the numbers were not aligned properly.

ADD REPLY • link 5.2 years ago by mohammedtoufiq91 ▴ 250

1

Entering edit mode

It's much clearer now, what you're doing looks perfectly correct.

ADD REPLY • link 5.2 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you for the reply Devon.

ADD REPLY • link 5.2 years ago by mohammedtoufiq91 ▴ 250

0

Entering edit mode

Hi Devon,

Do you have any suggestions for the below questions: 1. Statistical test that could be used for instance, comparing significance between Before vs After groups for multiple time points. 2. What type of plots that could be generated for the comparison? 3. Which list (DeltaCt or Delta Delta Ct or 2^(-△△Ct)) could be used for the further downstream analysis like Gene Ontology, Gene Enrichment Analysis, and Pathway Analysis?

ADD REPLY • link 5.2 years ago by mohammedtoufiq91 ▴ 250

1

Entering edit mode

T-test or similar (use the delta-delta Ct values for this, since it tends to be more normally distributed)
A scatter plot would be fine.
Either delta delta Ct or fold changes would work (fold changes are probably easier to think about.

ADD REPLY • link 5.2 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you. I have another question, I spoke about selection of the top 4 reference genes based on the different method (Genorm, Normfinder etc) and creating a psuedogene of those 4 genes by geometric mean in A. Handling multiple reference/housekeeping genes

Does averaging all 8 reference genes directly and creating a psudogene by skipping A Step sound fine? In general, calculating Dct by difference 264 target genes and 1 reference gene. Here reference gene is (arithmatic avg of 8 reference gene)

DCt = Target Gene 1 - Psuedoegene (arithmatic avg of 8 reference gene)

ADD REPLY • link 5.2 years ago by mohammedtoufiq91 ▴ 250

0

Entering edit mode

Taking the geometric or other mean of those seems reasonable, in effect that's similar to what we do for normalization in RNA-seq.

ADD REPLY • link 5.2 years ago by Devon Ryan 104k

1

Entering edit mode

Agree this seems fine. Perhaps a minor addition: computing △△Ct and 2^(-△△Ct) is of course fine if you are interested in fold-changes vs. a reference group, and it's popular with biologists, but if your experimental layout doesn't have an obvious single control group and/or you want to apply downstream analyses like ANOVA to log-scale normalized relative expression values, then just using -1*△Ct as your readout is quite valid too.

ADD REPLY • link 5.2 years ago by Ahill ★ 1.9k

0

Entering edit mode

Hi Ahil,

Please provide more information on this

but if your experimental layout doesn't have an obvious single control group and/or you want to apply downstream analyses like ANOVA to log-scale normalized relative expression values, then just using -1*△Ct as your readout is quite valid too.

We have Before and After types of groups for multiple time points.

ADD REPLY • link 5.2 years ago by mohammedtoufiq91 ▴ 250

1

Entering edit mode

We often have multi-group qPCR experimental designs where we simply want a normalized log2-scale relative expression measure, where larger values indicate higher expression. -1*dCt is just that number. In that sense, -1*dCt is analogous to array-based expression measures like RMA or RNA-Seq measures like log2CPM, which our audience is often very used to seeing. For downstream analyses like visualization, clustering, or other supervised or unsupervised analyses, especially when there is not a single obvious reference group, -1*dCT can be a good fit. In that context, ddCt and 2^-ddCt, while of course perfectly valid and familiar to audiences thinking about fold-changes from a reference group, just represent additional derived calculations (log-differencing and linearization) that don't provide any practical benefits in terms of interpretability, suitability for stats analysis, or variance/precision of the readout. While I don't know details of your experimental design, if your interest is in multiple paired comparisons (before-after) then you could certainly execute paired linear modeling analyses on -1*dCt values.

ADD REPLY • link 5.2 years ago by Ahill ★ 1.9k

0

Entering edit mode

Thank you. Indeed it is helpful.

ADD REPLY • link 5.2 years ago by mohammedtoufiq91 ▴ 250

0

Entering edit mode

Hi,

When I plot the hierarchical clustering heatmap using the significant genes (-DCt) between two conditions before vs after, I observe opposite profiles i.e Before (Red-max) and After (Green-min). However, when I run hierarchical clustering on an ANOVA list created from DCt, the profiles for the "After" conditions (Red) and "Before" (Green) condition. I am bit confused here as FC values for those genes are up-regulated in the statistical test. Could you please provide me some insights on this and what data should be used DCt or Negatice DCt for heatmap?

Thank you, Best Regards, Toufiq

ADD REPLY • link 5.0 years ago by mohammedtoufiq91 ▴ 250