Question: how to combine RNA seq data from 4 lanes
gravatar for jolin0701-dy
3.2 years ago by
jolin0701-dy70 wrote:

I just got my RNA seq data.





I only know there are from 4 different lanes during sequencing.

What are they and do I need to combine these fastq files into one? And how?


rna-seq • 3.9k views
ADD COMMENTlink modified 8 months ago by paumarc10 • written 3.2 years ago by jolin0701-dy70

If I had paired-end data, I would combine all the forward read files together, and all the reverse read files together, for all lanes and all flow cells, am I right (assuming there are no batch effects due to lanes and flow cells)?

One concern I have is the index (barcode) sequence for the different flow cells are different. Will this affect my analysis? Thanks!

ADD REPLYlink written 2.1 years ago by apuhegde20

If it is the same sample (where multiple libraries were made using separate indexes) and then run on multiple lanes/FC's, you could (in theory), combine the R1 and R2 files. It would be much better to use read groups ( ) to manage the aligned BAM files, keeping raw data separate.

For obvious reasons one shouldn't combine data from unrealted samples (since that would defeat the purpose of indexing them in the first place).

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by genomax68k

Aling the four fastq separately and marge the bam files with samtools merge

ADD REPLYlink modified 8 months ago • written 8 months ago by paumarc10
gravatar for genomax
3.2 years ago by
United States
genomax68k wrote:

It appears to be the same sample run in 4 separate lanes. You can cat the files together into one or process them independently (giving you a way to parallelize). You can merge the 4 sample bam files into one (and then sort) at the end.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by genomax68k
gravatar for WouterDeCoster
3.2 years ago by
WouterDeCoster39k wrote:

You first might want to QC your data lane by lane to compare lane effects.

Based on the filenames I get the idea your data is single ended? In that case you can just concatenate the files for downstream analysis.


cat CZ1_S6_L001_R1_001.fastq CZ1_S6_L002_R1_001.fastq CZ1_S6_L003_R1_001.fastq CZ1_S6_L004_R1_001.fastq > CZ1_S6_merged_R1_001.fastq
ADD COMMENTlink modified 8 months ago • written 3.2 years ago by WouterDeCoster39k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1100 users visited in the last hour