Question

Pearson correlation for RNAseq data - input formats

0

Entering edit mode

5 months ago

Lada ▴ 30

Hi guys,

I work with tiny crustaceans and did RNAseq on different species, each with 3 biological replicates (each replicate being a pool of 3 individuals). I want to check if there is a good correlation between my replicates.

I trimmed my reads, made de novo assembly in trinity, quantified my transcripts with Salmon (aligner free-based method) and built expression matrices. I plan to do the Pearson correlation test in R with the cor function. I have a couple of questions:

What is the correct input for this test: raw counts, TPMs or maybe transformed counts (vst or rlog which I can do in DESeq2)? I guess that since RNAseq data are very skewed toward a small fraction of highly expressed genes, I should use some kind of transformation...
Is it okay to do it on isoform level or should I do this analysis on trinity "genes"?
What do you consider a cutoff for as a measure of good reliability of your experiment? I read that ENCODE suggests that the square of the Pearson correlation coefficient should be larger than 0.92, under ideal experimental conditions. With TPMs, I am getting correlations from 0,63 to 0,99.

Tnx,

Lada

vst Pearson-correlation TPM rlog RNA-seq • 357 views

ADD COMMENT • link updated 5 months ago by Ram 43k • written 5 months ago by Lada ▴ 30