Using an aligner with parallel threading (bowtie2) how much faster is aligning then merging vs. merging then aligning?
2
0
Entering edit mode
9.0 years ago
kaston ▴ 40

I have a sample that has multiple fastq files, one for each lane that it was sequenced on. I was planning on merging these fastqs and then using the bowtie2 -p option to take advantage of all of the available cores on my machine. I have read that it can be faster to first align the multiple fastqs in parallel, then merge them into a single .sam file. But given that I am already using the -p option to parallelize alignment of my merged fastq, is this actually faster? For example, if I have 8 fastq files and 16 cores, which of the following is faster and by how much:

  • align all 8 fastqs in parallel using 2 cores each, then merge the .sams
  • merge fastqs then use all 16 cores to align

Thanks,
kaston

alignment bowtie2 parallelization • 3.5k views
ADD COMMENT
1
Entering edit mode
9.0 years ago

The likely fastest option is the one you didn't mention: give the fastq files as a comma separated list and use all 16 cores. Bowtie2 will then produce a single SAM file directly. Also, use a pipe to produce a BAM file...that's often faster since IO becomes the bottleneck when you increase the thread count.

ADD COMMENT
0
Entering edit mode

Devon, your response time is staggering! We'll try that, thanks!

Re; piping - we are currently writing the .sam files and then sorting and marking dups using Picard tools. The .sam files are ~200 GB in size and we have 70 GB of RAM available. I am assuming you think piping will avoid writing to disk, but is this avoidable with our file sizes and RAM constraints?

If piping is the way to go, we aren't sure what syntax to use. For sorting, our command is:

java -Xmx30g -Djava.io.tmpdir=$TEMPDIR -jar $PICARDDIR/SortSam.jar \
    INPUT=$ALNDIR/$sample.sam \
    OUTPUT=$ALNDIR/$sample\_sorted.bam \
    SORT_ORDER=coordinate

How would we change this to pipe the output of bowtie2?

I expect your answer in under 30 seconds.

ADD REPLY
1
Entering edit mode

Sorry to disappoint on the quickness of my reply, I blame the 6-9 hour time difference.

Piping obviously won't completely avoid writing to disk, but it's typically faster to convert to BAM and write that to disk than to write the raw SAM file.

Regarding piping with picard, see this post: Piping Input Into Picard Sortsam . In short, just use /dev/stdin is the input. You could also samtools (e.g., bowtie2 ...stuff... | samtools view -Su - | samtools sort -T prefix -O BAM - > foo.sorted.bam), though it doesn't index during sorting like picard does, which is too bad.

ADD REPLY

Login before adding your answer.

Traffic: 2352 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6