Fastq Files From Different Flowcells
2
5
Entering edit mode
8.3 years ago
hellbio ▴ 440

Hi,

For a single sample, i have several paired-end fastq files from four different flowcells. i.e. fastq files from different lanes from each flowcell. Instead of processing individual fastq files from different flowcells, can i merge all the forward reads(from different flowcells and different lanes) into a single fastq file and all the reverse end reads into another fastq file?

Thanks

fastq • 16k views
5
Entering edit mode
8.3 years ago

Yes, you can (see BruceyB's answer) but that's usually a bad idea.

You can process the fastqs in parallel using , for example make with the option -j (number of parallel tasks), and merge the SAM files later.

0
Entering edit mode

well i have fastq files from 8 lanes i.e. 8pairs of forward and reverse reads. IF we map them individually, we will end up with 8 sam files which has to be merged. In this case it becomes so complex with 8 different sam files to be merged. So, would it be wise to concatenate the fastq files and then generate a single sam/bam file?

0
Entering edit mode

if time is not problem, concatenate your FASTQs. If you can align the 8 pairs of fastq , convert to BAM and sort 8 jobs in *parallel, then you'll get your result faster.

" it becomes so complex.." : why ? A makefile will solve your problems.

0
Entering edit mode

comment from @notSoJunkDNA ( https://twitter.com/notSoJunkDNA/status/365440417212276736 ) "doesn't apply to all pipelines. Tophat for instance needs all the reads..."

0
Entering edit mode

could you please elaborate how a makefile will solve the problem? just curious...

0
Entering edit mode

with a makefile you can use something $(foreach,FASTQ,1 2 3 4 5 6 7 8,$(eval (call alignwithbwa{FASTQ}))) . See http://www.gnu.org/software/make/manual/html_node/Eval-Function.html

0
Entering edit mode

Could you please provide a sample make file, which you have been using. Make file might make life easier in case WGS data.

0
Entering edit mode
0
Entering edit mode

I would also like to mention that my data is paired-end data

0
Entering edit mode

why ?

any thing else beside speed and RG

0
Entering edit mode

Is processing Lane separately faster than using bwa on the merged fastq file with thread option ? Which one is faster ? Strategy A or B ? I think those strategies are equivalent.

Strategy A : Makefile

    bwa lane1.fastq
bwa lane2.fastq
bwa lane3.fastq
bwa lane4.fastq


Strategy B : merged

  bwa all.lane.fastq -t 4

3
Entering edit mode
8.3 years ago
BruceB ▴ 330

Yes, you can. The simplest way of doing this is with 'cat' on the terminal. This will concatenate the files you choose into one FQ file. E.g. cat R1_001.fq.gz R1_002.fq.gz ... R1_n.fq.gz > R1_combined.fq.gz

0
Entering edit mode

So it can be done by concatenating all the forward reads to 1_fastq.gz and reverse reads to 2_fastq.gz and then mapping the paired-end files to a single bam file.

0
Entering edit mode

Yes, that is exactly what I would do (and have done in the recent past). Once concatenated, you would never know they came from different lanes.

0
Entering edit mode

Not exactly, lane number is also represented in the sequence identifier, see http://support.illumina.com/help/SequencingAnalysisWorkflow/Content/Vault/Informatics/Sequencing_Analysis/CASAVA/swSEQ_mCA_FASTQFiles.htm

Each entry in a FASTQ file consists of four lines:
• Sequence identifier
• Sequence
• Quality score identifier line (consisting of a +)
• Quality score

Each sequence identifier, the line that precedes the sequence and describes it, needs to be in the following format:

@<instrument>:<run number="">:<flowcell id="">:<lane>:<tile>:<x-pos>:<y-pos> <read>:<is filtered="">:<control number="">:<index sequence="">