Question

Correlation between biological replicates

1

Entering edit mode

7.8 years ago

rauwaneme ▴ 10

Hello everyone

I have a question: I did transcriptome analysis on a cassava genotype harvested at 24 hr and 72 hr, compared with non-infested healthy plants. I had 3 biological replicates for each time-point, analysed with tophat/cufflinks/cuffdiff. The 3 biological replicates RNA-seq data from the cufflinks was used to test correlation on R and on excel between the replicates, however, I get a negative correlation, because there is a high difference in the FPKM values. Can you advise as to how I can calculate correlation between the replicates?

OR any other method?

Molemi.

rna-seq • 7.7k views

ADD COMMENT • link updated 7.8 years ago by Michael 54k • written 7.8 years ago by rauwaneme ▴ 10

1

Entering edit mode

If I understand correctly, you've calculated Pearson's correlation coefficient between replicates and you obtained negative values. Although one generally expects replicates to have high positive correlation, this is not always the case. Low correlation between replicates can be used as a quality control criterion e.g. to remove bad/failed replicates.

Have you looked at the data i.e. plotted replicates against each other ? This would make it easier to spot if the problem is caused by outliers. You may want to use raw counts for this.

ADD REPLY • link 7.8 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Did you use rank correlation?

ADD REPLY • link 7.8 years ago by russhh 5.7k

score 0 · Answer 1 · 2016-07-20

0

Entering edit mode

7.8 years ago

Floris Brenk ★ 1.0k

You can try to log your values, this sometimes helps when there are some highly expressed genes that vary a lot and therefore drive the negative correlation... Can you maybe share a correlation plot? This could give us a bit more information and probably we could help better then.

ADD COMMENT • link 7.8 years ago by Floris Brenk ★ 1.0k

score 0 · Answer 2 · 2016-07-20

Please try a different summarization units other than FPKM, e.g. TPM or CPM in addition and try a MDS (multidimensional scaling) plot. We have had many posts about the flawed concept of FPKM and this might be another effect. In my experience, samples in FPKM values cluster according to technical batches more than biological similarity. If there still is low correlation between biological replicates, you can start investigating them further.