Question

Confused about merging RNA-seq lanes/runs

1

Entering edit mode

5.9 years ago

BioinfGuru ★ 1.7k

This question: "Can we concatenate two fastq files from same sample but different runs" has me confused unfortunately.

Maybe I should know better by now , but rather than sweeping it under the carpet I'd rather clear up the confusion:

I was under the impression that any form of replicate should not be merged. I thought that separate runs of the same sample, is effectively a technical replicate. Also 1 run may work better than another --> producing a batch effect

So when is it ok and when is it not ok to merge?

Thanks all, Kenneth

Merge RNA-seq samples runs lanes • 4.7k views

ADD COMMENT • link updated 5.9 years ago by GenoMax 141k • written 5.9 years ago by BioinfGuru ★ 1.7k

score 6 · Answer 1 · 2018-06-19

6

Entering edit mode

5.9 years ago

GenoMax 141k

Some Illumina platforms have lanes on flowcells which are optically distinct but fluidically connected (NextSeq, NovaSeq). So one loads the same pool across the FC but the results may be reported on a per lane basis (unless collapsed into one file per sample during post-processing). So those can't really be considered technical replicates. If you were running the same sample on multiple FC then you could keep track of that by adding read groups if you want to be cautious.

Since you are sampling from the same library data produced by multiple runs should have similar distribution of reads (unless there are severe technical issues with run, e.g. air bubble in lane, unequal length of sequencing). Illumina's guidance has been that sequence produced by their platforms should be considered equivalent. If your definition of better run refers to yield of data then your downstream methods should be accounting for that (e.g. DESeq2) discrepancy.

Technical quality of sequencing long reached a point where there is no need to do technical replicates (and people have not been doing those for 5+ years).

ADD COMMENT • link 5.9 years ago by GenoMax 141k

1

Entering edit mode

Thanks Genomax.

I read the key point as "Since you are sampling from the same library data produced by multiple runs should have similar distribution of reads."

So lans/runs don't really matter as long as they come from the same library? For clarity, I'm defining a library as a single epindorf tube containing a sample sent for sequencing.

So does this mean that if an aliquot from sample A is sequenced on day 1, and a second aliquot from the same sample is sequenced on day 2....then I can merge the results of those 2 sequencing runs for a greater sequencing yield?

If I;m getting these points wrong... maybe you could direct me to a resource that explains it in detail?

ADD REPLY • link 5.9 years ago by BioinfGuru ★ 1.7k

0

Entering edit mode

One library is one prep done from a sample aliquot. One can make multiple libraries from a sample. While similar they would not be identical. If you deliberately select for different insert sizes then they definitely would be different.

People do sequence libraries over time if more data is needed. Prepared libraries are stable as long as they are stored properly. At some point in processing the data can be merged for final analysis.

ADD REPLY • link 5.9 years ago by GenoMax 141k