How to better distribute my replicates between two batches of bulk RNAseq?
2
0
Entering edit mode
11 months ago
Victor • 0

Hello everyone!

I want to run a RNAseq experiment with 18 samples, which have 3 replicates each (total of 54 RNA libraries). We have two flowcells available: one can fit 20 libraries and the other can fit 40.

We were wondering about how to better distribute the replicates libraries in these flowcells in order to correct for batch effect during the differential expression analysis with DESeq2. We are considering putting 1 library from all samples in the first flowcell and 2 libraries in the second, but we have not found good references about this particular question in experimental design. We have other options like running 6 samples (x3 replicates) first and 12 samples (3x replicates) second, or any combination of samples.

Therefore, the question. How to better distribute my replicates between two batches of bulk RNAseq? Additionally, is DESeq2 able to collapse replicates and account for batch effect in the same experiment?

replicates RNAseq batch-effect • 844 views
ADD COMMENT
2
Entering edit mode
11 months ago
GenoMax 142k

one can fit 20 libraries and the other can fit 40.

There is no limit on how many libraries "fit" on a flowcell. If your indexes overlap between the two pools then that may be a limitation on how you need to run these samples.

If the indexes do not overlap in these libraries simply create one large pool and run on as many FC you need. There is generally none (or negligible batch effect) as long as you are not changing chemistries or sequencer types.

ADD COMMENT
0
Entering edit mode

I guess I forgot to mention one detail. We have run these samples previously and we had gDNA contamination. The idea is to sequence first 20 libraries (we have a number of desired reads, that's why we set this max number), see whether we managed to remove the contamination, then proceed to sequence their replicates. Do you have any suggestion in this context?

ADD REPLY
0
Entering edit mode

I don't see why you couldn't pool them all and check them all, then run them all again to get up to the right number of reads.

This sound like more of a job for a MiSeq, then you run your samples for real on something bigger, like a NextSeq or a NovaSeq.

ADD REPLY
0
Entering edit mode

see whether we managed to remove the contamination,

Depending on what you did to address that there is a chance that this manipulation is going to introduce some batch effect for these 20 samples.

Ideally you should have simply sequenced more of the original libraries (assuming all samples were identically processed at the same time and only a few ended up with the gDNA?).

ADD REPLY
0
Entering edit mode

Thank you a lot for your help!

The original libraries were very contamined, there is no way to use them, but I like your idea to create one large pool. I will discuss this with the wet lab.

ADD REPLY
2
Entering edit mode
11 months ago

Running the samples on different flowcells does not add significant technical artifacts. So don't worry about it.

Or pool all the libraries into one pool, and run them on multiple flowcells, and then just join the fastqs together, assuming your sample barcodes allow for that.

If you split them, you might get a slight 'batch effect' due to differing numbers of reads, but DESeq is in general robust to this.

I'm unclear what you mean by "replicates". If you mean you took the same biological sample and chopped it into thirds before doing benchwork on it, those are not the replicates that DESeq wants. Those should be combined, you never needed or wanted them to begin with. If they are biological replicates (three different samples of the same kind), you don't ever combine those.

ADD COMMENT
0
Entering edit mode

Thanks for your answer. I guess I forgot to mention that we have run these samples previously and we had gDNA contamination. The idea is to sequence the flowcells in different days and runs, since the first run would be to confirm low gDNA contamination and the second to complete the replicates. I believe DESeq2 would be unable to account for the batch effects because we need to collapse the samples first, correct?

ADD REPLY

Login before adding your answer.

Traffic: 1807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6