Question: qRT-PCR data analysis steps and workflow
2
gravatar for mohammedtoufiq91
4 weeks ago by
mohammedtoufiq9130 wrote:

Hi,

I am currently working with the Fluidigm qRT-PCR data. There are 3 plates with total of 288 genes combined into one file (264 target genes + 8 * 3 (triplicate)= 24 Reference genes) with each plate consists of 96 genes (88 Target genes + 8 Reference genes) in one file. In summary, I have approximately 264 target genes and 8 * 3 = 24 reference genes in around 45 samples. Each of the samples are technically replicates.

I want to know if the methodology followed below is correct during the data analysis?

A. Handling multiple reference/housekeeping genes

Since, I have 8 reference genes in triplicate in each sample in a combined file, I created a data file with these reference genes across all samples,

  1. Average of on-chip reference genes at the each sample level (Arithmetic mean of 8 * 3 Reference genes leading to 8 * 1 Reference gene for each sample)
  2. Identify most stable reference genes for instance (top 4 ) using appropriate In-Silico approaches (geNorm, NormFinder, Bestkeeper etc) based on the M-value
  3. Create a Psuedogene by calculating the geometric mean of top 4 stable reference gene across samples

B. After this, 1. Create a new data file, average (arithmetic mean) technical replicates across all samples for remaining genes i.e with all the 264 target genes + psuedogene (geometric mean of top 4 reference gene)

For instance,

Detector Target Gene 1 Target Gene 2 . . . PsuedoGene

  1. Calculate △Ct (Difference between the Target gene and reference gene (i.e psuedogene))
  2. Calculate △△Ct (Difference between the sample and average of control samples)
  3. Calculate 2^(-△△Ct) to evaluate fold gene expression levels

Please let me know if the above analysis methodology looks fine?

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by mohammedtoufiq9130
2

It's unclear what exactly the "83 (triplicate)" is actually referring to, unless you have different but slightly overlapping reference gene sets by sample. Likewise, I'm not sure how taking the mean of of the reference genes changes their number (from 83 to 81). Aside from that the methodology seems fine (namely, creating a stable reference and then using the standard delta-delta Ct method).

ADD REPLYlink written 4 weeks ago by Devon Ryan88k
1

Hi Devon,

Thank you. I have reposted the question once again. Earlier, there was a formatting issue with the mathematical signs. Hence, the numbers were not aligned properly.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by mohammedtoufiq9130
1

It's much clearer now, what you're doing looks perfectly correct.

ADD REPLYlink written 4 weeks ago by Devon Ryan88k

Thank you for the reply Devon.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by mohammedtoufiq9130

Hi Devon,

Do you have any suggestions for the below questions: 1. Statistical test that could be used for instance, comparing significance between Before vs After groups for multiple time points. 2. What type of plots that could be generated for the comparison? 3. Which list (DeltaCt or Delta Delta Ct or 2^(-△△Ct)) could be used for the further downstream analysis like Gene Ontology, Gene Enrichment Analysis, and Pathway Analysis?

ADD REPLYlink written 4 weeks ago by mohammedtoufiq9130
1
  1. T-test or similar (use the delta-delta Ct values for this, since it tends to be more normally distributed)
  2. A scatter plot would be fine.
  3. Either delta delta Ct or fold changes would work (fold changes are probably easier to think about.
ADD REPLYlink written 4 weeks ago by Devon Ryan88k

Thank you. I have another question, I spoke about selection of the top 4 reference genes based on the different method (Genorm, Normfinder etc) and creating a psuedogene of those 4 genes by geometric mean in A. Handling multiple reference/housekeeping genes

Does averaging all 8 reference genes directly and creating a psudogene by skipping A Step sound fine? In general, calculating Dct by difference 264 target genes and 1 reference gene. Here reference gene is (arithmatic avg of 8 reference gene)

DCt = Target Gene 1 - Psuedoegene (arithmatic avg of 8 reference gene)

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by mohammedtoufiq9130

Taking the geometric or other mean of those seems reasonable, in effect that's similar to what we do for normalization in RNA-seq.

ADD REPLYlink written 4 weeks ago by Devon Ryan88k
1

Agree this seems fine. Perhaps a minor addition: computing △△Ct and 2^(-△△Ct) is of course fine if you are interested in fold-changes vs. a reference group, and it's popular with biologists, but if your experimental layout doesn't have an obvious single control group and/or you want to apply downstream analyses like ANOVA to log-scale normalized relative expression values, then just using -1*△Ct as your readout is quite valid too.

ADD REPLYlink written 4 weeks ago by Ahill1.4k

Hi Ahil,

Please provide more information on this

but if your experimental layout doesn't have an obvious single control group and/or you want to apply downstream analyses like ANOVA to log-scale normalized relative expression values, then just using -1*△Ct as your readout is quite valid too.

We have Before and After types of groups for multiple time points.

ADD REPLYlink written 4 weeks ago by mohammedtoufiq9130
1

We often have multi-group qPCR experimental designs where we simply want a normalized log2-scale relative expression measure, where larger values indicate higher expression. -1*dCt is just that number. In that sense, -1*dCt is analogous to array-based expression measures like RMA or RNA-Seq measures like log2CPM, which our audience is often very used to seeing. For downstream analyses like visualization, clustering, or other supervised or unsupervised analyses, especially when there is not a single obvious reference group, -1*dCT can be a good fit. In that context, ddCt and 2^-ddCt, while of course perfectly valid and familiar to audiences thinking about fold-changes from a reference group, just represent additional derived calculations (log-differencing and linearization) that don't provide any practical benefits in terms of interpretability, suitability for stats analysis, or variance/precision of the readout. While I don't know details of your experimental design, if your interest is in multiple paired comparisons (before-after) then you could certainly execute paired linear modeling analyses on -1*dCt values.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Ahill1.4k

Thank you. Indeed it is helpful.

ADD REPLYlink written 4 weeks ago by mohammedtoufiq9130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1302 users visited in the last hour