Best practice for merging across lanes
1
1
Entering edit mode
23 months ago
dave ▴ 20

Hi all,

I'm interested in learning more about best practices for merging RNA-seq technical replicates. I've read many Biostars posts on the matter, but I have a somewhat special case.

Background:

I sent RNA samples for sequencing, which were split across 4 lanes each as per normal practice. However, the sequencing depth for this run was much lower than expected and the sequencing core re-sequenced the same samples. Thus, I now essentially have 8x technical replicates per sample, 4x lanes from each of two runs. The repeat runs achieved much better depth.

Questions:

  1. In this case, should the lower depth data be discarded, or can these data still be used in combination with the updated runs?
  2. If combined, is there a need to mitigate batch effect? Aside from read depth, the samples have nearly identical statistics with respect to FastQC analysis, % genome alignment individually with STAR, etc.
  3. For combining, what stage is most appropriate? Aside from file sizes, is there any difference between merged .fastq files and merged .bam files? What about at the level of raw counts? In the past, I have merged .bam files from different lanes and found that the effect was summing the raw reads per gene between replicates.
RNA-Seq • 1.0k views
ADD COMMENT
1
Entering edit mode

they are exactly the same samples? or even the left-over lib-prep of the first run?

ADD REPLY
0
Entering edit mode

Great question! Input RNA isolate is identical, though I am unsure if the library prep was repeated between runs. I'll ask and let you know ASAP.

ADD REPLY
3
Entering edit mode
23 months ago

It's very very unlikely that the library prep was redone. You almost certainly should just put everything together; running the same library on different lanes or different days does not introduce any technical artifacts.

The only exception would be if the first run had a serious technical problem with the instrument itself causing the reads to be totally unusable, but I doubt they would have given you the reads at all if that were the case.

ADD COMMENT
0
Entering edit mode

Thank you much for your answer, this is what I suspected. Upon inspection of the two runs, the results in parallel from FastQC -> alignment -> raw counts appear identical, with the only difference being read depth. I'll proceed with caution and likely run some PCA just to confirm samples from the same experimental condition fall in line after adjusting for library size.

ADD REPLY
1
Entering edit mode

Just to follow up on this, PCA of the top 500 most variable genes across all samples did reveal that the paired samples from each sequencing run overlapped, I feel very comfortable merging these data now. Thanks again!

ADD REPLY
0
Entering edit mode

Based on the comment thread, it appears that you can just merge the data as-is.

But if read depth had been an issue, you could have treated depth as a quantitative technical confounder, while treating the sample ID as the biological signal to preserve. For what it's worth, I recently developed a method for this type of setting: https://github.com/calvinmccarter/condo-adapter

ADD REPLY

Login before adding your answer.

Traffic: 2276 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6