Question: Confused about merging RNA-seq lanes/runs
0
gravatar for YaGalbi
9 months ago by
YaGalbi1.4k
Biocomputing, MRC Harwell Institute, Oxford, UK
YaGalbi1.4k wrote:

This question: "Can we concatenate two fastq files from same sample but different runs" has me confused unfortunately.

Maybe I should know better by now , but rather than sweeping it under the carpet I'd rather clear up the confusion:

I was under the impression that any form of replicate should not be merged. I thought that separate runs of the same sample, is effectively a technical replicate. Also 1 run may work better than another --> producing a batch effect

So when is it ok and when is it not ok to merge?

Thanks all, Kenneth

rna-seq runs samples merge lanes • 620 views
ADD COMMENTlink modified 9 months ago by genomax64k • written 9 months ago by YaGalbi1.4k
6
gravatar for genomax
9 months ago by
genomax64k
United States
genomax64k wrote:

Some Illumina platforms have lanes on flowcells which are optically distinct but fluidically connected (NextSeq, NovaSeq). So one loads the same pool across the FC but the results may be reported on a per lane basis (unless collapsed into one file per sample during post-processing). So those can't really be considered technical replicates. If you were running the same sample on multiple FC then you could keep track of that by adding read groups if you want to be cautious.

Since you are sampling from the same library data produced by multiple runs should have similar distribution of reads (unless there are severe technical issues with run, e.g. air bubble in lane, unequal length of sequencing). Illumina's guidance has been that sequence produced by their platforms should be considered equivalent. If your definition of better run refers to yield of data then your downstream methods should be accounting for that (e.g. DESeq2) discrepancy.

Technical quality of sequencing long reached a point where there is no need to do technical replicates (and people have not been doing those for 5+ years).

ADD COMMENTlink modified 9 months ago • written 9 months ago by genomax64k

Thanks Genomax.

I read the key point as "Since you are sampling from the same library data produced by multiple runs should have similar distribution of reads."

So lans/runs don't really matter as long as they come from the same library? For clarity, I'm defining a library as a single epindorf tube containing a sample sent for sequencing.

So does this mean that if an aliquot from sample A is sequenced on day 1, and a second aliquot from the same sample is sequenced on day 2....then I can merge the results of those 2 sequencing runs for a greater sequencing yield?

If I;m getting these points wrong... maybe you could direct me to a resource that explains it in detail?

ADD REPLYlink modified 9 months ago • written 9 months ago by YaGalbi1.4k

One library is one prep done from a sample aliquot. One can make multiple libraries from a sample. While similar they would not be identical. If you deliberately select for different insert sizes then they definitely would be different.

People do sequence libraries over time if more data is needed. Prepared libraries are stable as long as they are stored properly. At some point in processing the data can be merged for final analysis.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax64k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1061 users visited in the last hour