Question: Correlation between biological replicates
gravatar for rauwaneme
2.7 years ago by
rauwaneme10 wrote:

Hello everyone

I have a question: I did transcriptome analysis on a cassava genotype harvested at 24 hr and 72 hr, compared with non-infested healthy plants. I had 3 biological replicates for each time-point, analysed with tophat/cufflinks/cuffdiff. The 3 biological replicates RNA-seq data from the cufflinks was used to test correlation on R and on excel between the replicates, however, I get a negative correlation, because there is a high difference in the FPKM values. Can you advise as to how I can calculate correlation between the replicates?

OR any other method?


rna-seq • 3.1k views
ADD COMMENTlink modified 2.7 years ago by Michael Dondrup45k • written 2.7 years ago by rauwaneme10

If I understand correctly, you've calculated Pearson's correlation coefficient between replicates and you obtained negative values. Although one generally expects replicates to have high positive correlation, this is not always the case. Low correlation between replicates can be used as a quality control criterion e.g. to remove bad/failed replicates.

Have you looked at the data i.e. plotted replicates against each other ? This would make it easier to spot if the problem is caused by outliers. You may want to use raw counts for this.

ADD REPLYlink written 2.7 years ago by Jean-Karim Heriche18k

Did you use rank correlation?

ADD REPLYlink written 2.7 years ago by russhh4.2k
gravatar for Floris Brenk
2.7 years ago by
Floris Brenk880
Floris Brenk880 wrote:

You can try to log your values, this sometimes helps when there are some highly expressed genes that vary a lot and therefore drive the negative correlation... Can you maybe share a correlation plot? This could give us a bit more information and probably we could help better then.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Floris Brenk880
gravatar for Michael Dondrup
2.7 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

Please try a different summarization units other than FPKM, e.g. TPM or CPM in addition and try a MDS (multidimensional scaling) plot. We have had many posts about the flawed concept of FPKM and this might be another effect. In my experience, samples in FPKM values cluster according to technical batches more than biological similarity. If there still is low correlation between biological replicates, you can start investigating them further.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Michael Dondrup45k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1282 users visited in the last hour