Hello all,
Naïve, but hopefully simple question:
I’m conducting a replicated RNAseq experiment to assay tissue-specific gene expression profiles. I would like to statistically evaluate the reproducibility of gene expression values across a set biological replicates. This seems like a valuable preliminary step, but I cannot find good documentation on how to approach this. Perhaps I am missing something fundamental (?).
To date, I’ve used DESeq2 to test whether any genes differ significantly between replicates. I’ve treated each replicate as an un-replicated sample (see below), which DESeq forces as replicates in this case. Sequencing depth/count is pretty high for each gene, but variable across each replicate.
Example sample input for DESeq analysis (i.e., colData):
condition type
data1 data1 paired-end
data2 data2 paired-end
data3 data3 paired-end
It's unclear if this is a reasonable approach, although DESeq’s assumptions seem most appropriate here. Aside from that, I worry that power may be too low to be meaningful (three replicates). Any suggestions?
Thanks!
Why not just look at the correlation of normalized counts? That would (A) be fast and (B) seem to directly address what you're trying to do.
Regarding power, that'll depend on the effect size that you hope to see. If whatever you're doing has a large effect then 3 replicates is probably OK. If it has a small to moderate effect then you might need more. The only way to know is to give things a go and see what happens, of course.