Question

Fastq Files From Different Flowcells

5

Entering edit mode

10.7 years ago

hellbio ▴ 520

Hi,

For a single sample, i have several paired-end fastq files from four different flowcells. i.e. fastq files from different lanes from each flowcell. Instead of processing individual fastq files from different flowcells, can i merge all the forward reads(from different flowcells and different lanes) into a single fastq file and all the reverse end reads into another fastq file?

Thanks

fastq • 18k views

ADD COMMENT • link updated 10.7 years ago by Pierre Lindenbaum 161k • written 10.7 years ago by hellbio ▴ 520

score 5 · Answer 1 · 2013-08-08

5

Entering edit mode

10.7 years ago

Pierre Lindenbaum 161k

Yes, you can (see BruceyB's answer) but that's usually a bad idea.

You can process the fastqs in parallel using , for example make with the option -j (number of parallel tasks), and merge the SAM files later.

enter image description here

ADD COMMENT • link 10.7 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

well i have fastq files from 8 lanes i.e. 8pairs of forward and reverse reads. IF we map them individually, we will end up with 8 sam files which has to be merged. In this case it becomes so complex with 8 different sam files to be merged. So, would it be wise to concatenate the fastq files and then generate a single sam/bam file?

ADD REPLY • link 10.7 years ago by hellbio ▴ 520

0

Entering edit mode

if time is not problem, concatenate your FASTQs. If you can align the 8 pairs of fastq , convert to BAM and sort 8 jobs in *parallel, then you'll get your result faster.

" it becomes so complex.." : why ? A makefile will solve your problems.

ADD REPLY • link 10.7 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

comment from @notSoJunkDNA ( https://twitter.com/notSoJunkDNA/status/365440417212276736 ) "doesn't apply to all pipelines. Tophat for instance needs all the reads..."

ADD REPLY • link 10.7 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

could you please elaborate how a makefile will solve the problem? just curious...

ADD REPLY • link 10.7 years ago by Sebastian Kurscheid ▴ 300

0

Entering edit mode

with a makefile you can use something $(foreach,FASTQ,1 2 3 4 5 6 7 8, $(eval $(call alignwithbwa ${FASTQ}))) . See http://www.gnu.org/software/make/manual/html_node/Eval-Function.html

ADD REPLY • link 10.7 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Could you please provide a sample make file, which you have been using. Make file might make life easier in case WGS data.

ADD REPLY • link 9.6 years ago by hellbio ▴ 520

0

Entering edit mode

search github: https://gist.github.com/search?l=makefile&q=mpileup

ADD REPLY • link 9.6 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

I would also like to mention that my data is paired-end data

ADD REPLY • link 9.6 years ago by hellbio ▴ 520

0

Entering edit mode

why ?

that's usually a bad idea.

any thing else beside speed and RG

ADD REPLY • link 7.6 years ago by Medhat 9.7k

0

Entering edit mode

Is processing Lane separately faster than using bwa on the merged fastq file with thread option ? Which one is faster ? Strategy A or B ? I think those strategies are equivalent.

Strategy A : Makefile

    bwa lane1.fastq
    bwa lane2.fastq 
    bwa lane3.fastq
    bwa lane4.fastq

Strategy B : merged

  bwa all.lane.fastq -t 4

ADD REPLY • link 6.8 years ago by sacha ★ 2.4k

score 3 · Answer 2 · 2013-08-08

3

Entering edit mode

10.7 years ago

BruceB ▴ 340

Yes, you can. The simplest way of doing this is with 'cat' on the terminal. This will concatenate the files you choose into one FQ file. E.g. cat R1_001.fq.gz R1_002.fq.gz ... R1_n.fq.gz > R1_combined.fq.gz

ADD COMMENT • link 10.7 years ago by BruceB ▴ 340

0

Entering edit mode

So it can be done by concatenating all the forward reads to 1_fastq.gz and reverse reads to 2_fastq.gz and then mapping the paired-end files to a single bam file.

ADD REPLY • link 10.7 years ago by hellbio ▴ 520

0

Entering edit mode

Yes, that is exactly what I would do (and have done in the recent past). Once concatenated, you would never know they came from different lanes.

ADD REPLY • link 10.7 years ago by BruceB ▴ 340

0

Entering edit mode

Not exactly, lane number is also represented in the sequence identifier, see http://support.illumina.com/help/SequencingAnalysisWorkflow/Content/Vault/Informatics/Sequencing_Analysis/CASAVA/swSEQ_mCA_FASTQFiles.htm

Each entry in a FASTQ file consists of four lines:
• Sequence identifier
• Sequence
• Quality score identifier line (consisting of a +)
• Quality score

Each sequence identifier, the line that precedes the sequence and describes it, needs to be in the following format:

@<instrument>:<run number="">:<flowcell id="">:<lane>:<tile>:<x-pos>:<y-pos> <read>:<is filtered="">:<control number="">:<index sequence="">

ADD REPLY • link 7.9 years ago by chen ★ 2.5k