Question

Split Fastq files

0

Entering edit mode

4.4 years ago

bsmith030465 ▴ 240

Hi,

I just got some fastq files from our sequencing center. The folder names are:

DXC-1-1_lane2_20180520000/
DXC-1-1_lane4_20180520000/
DXC-1-1_lane1_20180520000/
DXC-1-1_lane3_20180520000/

DXC-1-2_lane2_20180520000/
DXC-1-2_lane4_20180520000/
DXC-1-2_lane1_20180520000/
DXC-1-2_lane3_20180520000/
DXC-1-3_lane2_20180520000/
DXC-1-3_lane4_20180520000/
DXC-1-3_lane1_20180520000/
DXC-1-3_lane3_20180520000/

.

DXC-1-5_lane3_20180520000/

Each folder above has a forward and reverse fastq.gz file. Does this mean that ,for each sample, the fastq has been split in 5 parts (across each lane) and that I'll have to combine the forward and reverse reads for each lane to get one set of fastq files for each sample?

Is there a webpage that would explain this?

thanks!

fastq illumina • 1.2k views

ADD COMMENT • link 4.4 years ago by bsmith030465 ▴ 240

0

Entering edit mode

Hello bsmith030465

Why did you edit your post? There is no change in the content and you already have answers that solved the question.

ADD REPLY • link 4.4 years ago by Ram 43k

1

Entering edit mode

4.4 years ago

Pierre Lindenbaum 161k

that I'll have to combine the forward and reverse reads for each lane to get one set of fastq files for each sample?

if there is more than on pair of fastq for each sample, you can take advantage of this by parallelizing your processes.

First map each pair of fastq with bwa and sort the resulting sam (one process for each pair of fastq)

Then merge each bam by sample .

ADD COMMENT • link 4.4 years ago by Pierre Lindenbaum 161k

score 2 · Accepted Answer · 2019-11-13

Does this mean that ,for each sample, the fastq has been split in 5 parts (across each lane) and that I'll have to combine the forward and reverse reads for each lane to get one set of fastq files for each sample?

In theory, there might be QC issues between lanes, like if there was a fluid blockage or a bubble, but in general, you can and should combine data from different lanes together. If it all comes from one Illumina library, being split onto different lanes, or even different flowcells is not a problem. The Illumina instrument doesn't add any technical batch effects at that step.

I assumed DXC1-1 and DXC1-2 are different samples, and should not be combined, but you would know that better than anyone here.