Hello,

I need to clarify a theoretical/conceptual analytical approach. Based on my understanding of the fold change patterns observed in an RNA-seq DE experiment. When successful, one gathers very unique patterns of gene/transcript expression which are observed when plotted on a scatterplot.

When two datasets/replicates are compared against each other using their respective log2 FC to assess correlation. Due to the unique fold change expression pattern, if they correlate, then a diagonal linear scatterplot distribution would be observed with a resulting line projection with a slope =~1.

m=~1 Under that rationale if I compare two replicates and observe a diagonal linear distribution, I assume there is correlation between replicates and there is confidence to move forward either merging the resulting reads or continuing forward with gathering candidate genes per replicate for further analysis. If the pattern observed has a "cross-like" distribution with a resulting linear projection with slope = ~0 then I make up there is some major difference between replicates and have to be cautious moving forward.

m=~0 Can anyone expand on what would the implications of low correlation be when performing this log2FC vs log2FC comparison? Does the "cross" or "cross-like" pattern mean that there is no correlation between replicates and data should not be used? Could this pattern be indicative of major differences like "batch" effects but still have relevant DE patterns for continuing forward with analysis?

Thanks, any insight is appreciated!

Best,

Pablo

Please read and apply How to add images to a Biostars post

How are you comparing replicates if there's a fold-change? What's the fold-change versus?

I'm comparing fold-change of treated rep1 vs control with fold_change of treated rep2 vs control

Please plot the normalized counts instead, there are then fewer variables that can affect the results.

Thanks for the response, that is what I've been advised also! So, the take home message is that when observing fold change it could be noisy, therefore not the best correlational strategy?

It's more that "keep it simple" is the best strategy.