Question: How to align the same samples that were sequenced in multiple flow cells and lanes
0
gravatar for Matina
22 months ago by
Matina160
United Kingdom/University of Edinburgh
Matina160 wrote:

Hi,

I have a set of FASTQ files that I want to align to the reference genome. The sequencing for each sample has been done on 2 different runs (flow cells) and 2 different lanes so for each sample I have 4 files. I am not sure when I should merge my files, before or after alignment? I read previous posts that suggest to merge the samples after alignment, but I am not sure what is the best in my case. Could I merge the samples using samtools? Do I just simply cat one at the end of the other?

An example for sample1 is shown below (FC = flow cell, L = Lane)

sample1.FC1.L1

sample1.FC1.L2

sample1.FC2.L1

sample1.FC2.L2

Thanks a lot in advance!

rna-seq alignment • 1.3k views
ADD COMMENTlink modified 22 months ago • written 22 months ago by Matina160
3
gravatar for Pierre Lindenbaum
22 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

when aligning, you should specify the lane information in the read group (RG): .e.g see: How to choose the right RG,SM and LB values for alignment

you can align and sort each pair of fastq and merge them later: e.g: Merging Bam Files

ADD COMMENTlink written 22 months ago by Pierre Lindenbaum119k

thanks a lot for the reply. I was wondering why it is important to specify RG.

ADD REPLYlink written 21 months ago by Matina160

Read groups may be used to indicate which libraries are technical replicates of one another. That will help the variant caller decide how much variability comes from the instrument itself.

ADD REPLYlink written 21 months ago by Istvan Albert ♦♦ 80k
2
gravatar for Istvan Albert
22 months ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

In general it is probably best to keep these separate as they form technical replicates and will help you assess potential biases between runs. You would be able the merge the alignment files later.

If you were to perform a study that works best with maximal data (like a genome assembly) then merging them early on is recommended

ADD COMMENTlink written 22 months ago by Istvan Albert ♦♦ 80k

OK i got it, align first and then merge. I want to do a simple differential expression analysis. Thanks a lot!

ADD REPLYlink written 22 months ago by Matina160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 918 users visited in the last hour