Question: Comparisons between datasets generated at different timepoints
l.lavitas0 wrote:

Hi folks,

My situation is like this:

I have RNAseq, MBDseq and ChIPseq data on stem cells and a differentiated cell type which I want to compare to each other. Material came from freshly isolated sorted cells from mice with 2 and 3 replicates for MBDseq and RNA-/ChIPseq respectively.

Whereas I recently generated all the data for the differentiated cell type, a colleague of mine generated all the data for stem cells one year ago.We used the same protocols and reagents. Also the way the libraries were done and sequenced is the same.

I was now wondering to what extent I can compare the data from both cell types to each other since I understand that I will have problems to distinguish whether any differences detected are true or just due to the batch effect. I assume the batch effect will be stronger for the MBD- and ChIP-seq data but maybe less variable for the RNAseq data since the latter might be more "robust" and comparisons for RNAseq can hence be done in a more qualitative than quantitative way.

Maybe someone can comment on this issue and suggest which comparisons would still be accurate (also in terms of publishing these results).

Many thanks for your help.



Ryan Dale4.8k wrote:

Honestly I think this sort of thing happens all the time.  You can't always anticipate the full suite of experiments needed and perform them at the same exact time.  And there's often neither money nor time to re-run everything once hindsight tells you experiments X, Y and Z will be analyzed together.  In fact, most of the published papers I've recently been involved with had this sort of incremental data acquisition.  I don't recall any reviewers commenting on it . . . maybe they understood it's a common situation.

