Question: how to combine RNA seq data from 4 lanes
gravatar for jolin0701-dy
2.7 years ago by
jolin0701-dy60 wrote:

I just got my RNA seq data.





I only know there are from 4 different lanes during sequencing.

What are they and do I need to combine these fastq files into one? And how?


rna-seq • 3.2k views
ADD COMMENTlink modified 9 weeks ago by paumarc10 • written 2.7 years ago by jolin0701-dy60

If I had paired-end data, I would combine all the forward read files together, and all the reverse read files together, for all lanes and all flow cells, am I right (assuming there are no batch effects due to lanes and flow cells)?

One concern I have is the index (barcode) sequence for the different flow cells are different. Will this affect my analysis? Thanks!

ADD REPLYlink written 20 months ago by apuhegde20

If it is the same sample (where multiple libraries were made using separate indexes) and then run on multiple lanes/FC's, you could (in theory), combine the R1 and R2 files. It would be much better to use read groups ( ) to manage the aligned BAM files, keeping raw data separate.

For obvious reasons one shouldn't combine data from unrealted samples (since that would defeat the purpose of indexing them in the first place).

ADD REPLYlink modified 19 months ago • written 19 months ago by genomax59k

Aling the four fastq separately and marge the bam files with samtools merge

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by paumarc10
gravatar for genomax
2.7 years ago by
United States
genomax59k wrote:

It appears to be the same sample run in 4 separate lanes. You can cat the files together into one or process them independently (giving you a way to parallelize). You can merge the 4 sample bam files into one (and then sort) at the end.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by genomax59k
gravatar for WouterDeCoster
2.7 years ago by
WouterDeCoster35k wrote:

You first might want to QC your data lane by lane to compare lane effects.

Based on the filenames I get the idea your data is single ended? In that case you can just concatenate the files for downstream analysis.


cat CZ1_S6_L001_R1_001.fastq CZ1_S6_L002_R1_001.fastq CZ1_S6_L003_R1_001.fastq CZ1_S6_L004_R1_001.fastq > CZ1_S6_merged_R1_001.fastq
ADD COMMENTlink modified 9 weeks ago • written 2.7 years ago by WouterDeCoster35k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 960 users visited in the last hour