Question

Why split fastq files?

0

Entering edit mode

2.8 years ago

shpak.max ▴ 50

I've returned to the bioinformatics world after an absence of several years, so the following question may seem naive:

I've inherited scripts from a former colleague for an alignment pipeline. Most of it is straightforward, but I'm unsure about one of the preliminary pre-processing steps.

Namely, there's a perl script that splits the fastq files into subsets of reads set to some input argument value. If the alignments (using bwa) were parallelized, I could see the reason for doing this, but in the absence of parallelization, what is gained computationally by splitting fastq files?

fastq alignments ngs bwa • 694 views

ADD COMMENT • link updated 2.8 years ago by GenoMax 141k • written 2.8 years ago by shpak.max ▴ 50

score 1 · Answer 1 · 2021-06-29

1

Entering edit mode

2.8 years ago

GenoMax 141k

One can align the pieces in parallel (using brute-force parallelization) followed by merging of the BAM files.

ADD COMMENT • link 2.8 years ago by GenoMax 141k

0

Entering edit mode

Could you clarify what is meant by "brute-force parallelization" (as opposed to parallelization proper)?

ADD REPLY • link 2.8 years ago by shpak.max ▴ 50

1

Entering edit mode

Starting multiple alignment jobs using fastq file pieces. You can then merge the resulting BAM files to produce a final single file.

ADD REPLY • link 2.8 years ago by GenoMax 141k