Question

Any problem re-using last year's RNA-Seq data in this year's analysis?

0

Entering edit mode

3.7 years ago

oakhamwolf ▴ 20

Dear all,

This may be a crazy simple question but one I am not sure about.

Last year we did some RNA-Seq to identify any off-target effects for a therapeutic. This was a fairly simple experimental design involving three pseudo-biological replicates of an untreated control and two treatment doses. We looked for genes that were DE between the untreated and treated samples to see if there were any potential off-target effects.

This year we are determining the effect of the therapeutic in patients using RNA-Seq. Initially, we want to compare the untreated patient data with a control to estimate the baseline disease phenotype. One option is to re-submit the same control sample from our previous experiment for sequencing. However, this would obviously take up space on the run and we'd rather use that space for deeper sequencing of our patient samples.

So my question is, are we safe to re-use the previous control RNA-Seq data from last year? The new patient data will be processed in the same way (STAR alignment, RSEM quantification, DE using EdgeR) and would be generated in the same way (same library prep, sequencing platform, read length). I am just concerned that last year's control data may harbour bias due to not being prepared and run at the same time as this year's patient samples.

Incidentally, we will also submit a different control sample as part of this year's RNA-Seq run, but re-using last year's control data in addition should help us get a better idea of what is going on in our patients.

Any advice would be very gratefully received. Many thanks in advance.

Kind regards

RNA-Seq batch-effect • 894 views

ADD COMMENT • link updated 9 days ago by Ram 43k • written 3.7 years ago by oakhamwolf ▴ 20

score 2 · Accepted Answer · 2020-08-11

2

Entering edit mode

3.7 years ago

Joe 21k

Generally - no, there will almost certainly be batch effects.

Is 'last years' data also from patients, or some kind of in vitro experiment?

You might be able to use the old control in concert with the new control with sufficient sanity checking, but chances are you'll lose some of the subtle differences.

ADD COMMENT • link 3.7 years ago by Joe 21k

0

Entering edit mode

Thanks Joe, this is what my gut was telling me. Last year's data and this year's data are from iPSC-derived RPE cells, either from a control individual or from patients. I was mainly wondering whether the field had come far enough that current batch correction methods would allow me to incorporate last year's data. Is this what you mean by sanity checking? Many thanks again.

ADD REPLY • link 3.7 years ago by oakhamwolf ▴ 20

1

Entering edit mode

Batch correction is not my strong suit by any stretch, so I can't offer much in terms of how good or otherwise the correction has come along, but generally speaking just good experimental design principles would exclude using the older data.

If your phenotypes are real and pronounced in any way, I'd expect that to shine through the noise of the data at any rate, so by sanity check, I really just mean take a look at what you can conclude from the 2 batches of data analysed separately, and try to mesh these as a biological story.

If the data is really good, and you see what you are expecting in both, then its more compelling to have them as separate validating experiments anyway.

If you see nothing or only very subtle effects in one or both of the sets of data, merging them is likely to just amplify the noise and make it harder to see what's happening in those subtle cases.

At least that would be my intuition anyway!

ADD REPLY • link 3.7 years ago by Joe 21k

0

Entering edit mode

Great, thanks again. I agree about the good experimental design principles comment, this is mainly a stab at getting more sequencing data for our patient samples! So maybe the way forward would be, as you say, to not simply blindly use both control samples in the same analysis, but compare the patient samples to each set of control samples separately as well as comparing the two control samples and then make a judgement call on how that biological story looks. If I've got that right then this makes sense. Thanks for your input :)