Question: How to align the same samples that were sequenced in multiple flow cells and lanes
gravatar for Matina
2.7 years ago by
United Kingdom/University of Edinburgh
Matina170 wrote:


I have a set of FASTQ files that I want to align to the reference genome. The sequencing for each sample has been done on 2 different runs (flow cells) and 2 different lanes so for each sample I have 4 files. I am not sure when I should merge my files, before or after alignment? I read previous posts that suggest to merge the samples after alignment, but I am not sure what is the best in my case. Could I merge the samples using samtools? Do I just simply cat one at the end of the other?

An example for sample1 is shown below (FC = flow cell, L = Lane)





Thanks a lot in advance!

rna-seq alignment • 1.8k views
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Matina170
gravatar for Pierre Lindenbaum
2.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum126k wrote:

when aligning, you should specify the lane information in the read group (RG): .e.g see: How to choose the right RG,SM and LB values for alignment

you can align and sort each pair of fastq and merge them later: e.g: Merging Bam Files

ADD COMMENTlink written 2.7 years ago by Pierre Lindenbaum126k

thanks a lot for the reply. I was wondering why it is important to specify RG.

ADD REPLYlink written 2.6 years ago by Matina170

Read groups may be used to indicate which libraries are technical replicates of one another. That will help the variant caller decide how much variability comes from the instrument itself.

ADD REPLYlink written 2.6 years ago by Istvan Albert ♦♦ 82k
gravatar for Istvan Albert
2.7 years ago by
Istvan Albert ♦♦ 82k
University Park, USA
Istvan Albert ♦♦ 82k wrote:

In general it is probably best to keep these separate as they form technical replicates and will help you assess potential biases between runs. You would be able the merge the alignment files later.

If you were to perform a study that works best with maximal data (like a genome assembly) then merging them early on is recommended

ADD COMMENTlink written 2.7 years ago by Istvan Albert ♦♦ 82k

OK i got it, align first and then merge. I want to do a simple differential expression analysis. Thanks a lot!

ADD REPLYlink written 2.7 years ago by Matina170
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1092 users visited in the last hour