Question: Using an aligner with parallel threading (bowtie2) how much faster is aligning then merging vs. merging then aligning?
0
gravatar for kaston
3.4 years ago by
kaston30
Canada
kaston30 wrote:

I have a sample that has multiple fastq files, one for each lane that it was sequenced on.  I was planning on merging these fastqs and then using the bowtie2 -p option to take advantage of all of the available cores on my machine.  I have read that it can be faster to first align the multiple fastqs in parallel, then merge them into a single .sam file.  But given that I am already using the -p option to parallelize alignment of my merged fastq, is this actually faster?  For example, if I have 8 fastq files and 16 cores, which of the following is faster and by how much:

- align all 8 fastqs in parallel using 2 cores each, then merge the .sams

- merge fastqs then use all 16 cores to align

thanks,

kaston

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by kaston30
1
gravatar for Devon Ryan
3.4 years ago by
Devon Ryan85k
Freiburg, Germany
Devon Ryan85k wrote:

The likely fastest option is the one you didn't mention: give the fastq files as a comma separated list and use all 16 cores. Bowtie2 will then produce a single SAM file directly. Also, use a pipe to produce a BAM file...that's often faster since IO becomes the bottleneck when you increase the thread count.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Devon Ryan85k

Devon, your response time is staggering!  We'll try that, thanks!

Re; piping - we are currently writing the .sam files and then sorting and marking dups using Picard tools.  The .sam files are ~200 GB in size and we have 70 GB of RAM available.  I am assuming you think piping will avoid writing to disk, but is this avoidable with our file sizes and RAM constraints?

If piping is the way to go, we aren't sure what syntax to use.  For sorting, our command is:

java -Xmx30g -Djava.io.tmpdir=$TEMPDIR -jar $PICARDDIR/SortSam.jar \
    INPUT=$ALNDIR/$sample.sam \
    OUTPUT=$ALNDIR/$sample\_sorted.bam \
    SORT_ORDER=coordinate

How would we change this to pipe the output of bowtie2?

I expect your answer in under 30 seconds.

ADD REPLYlink written 3.4 years ago by kaston30
1

Sorry to disappoint on the quickness of my reply, I blame the 6-9 hour time difference.

Piping obviously won't completely avoid writing to disk, but it's typically faster to convert to BAM and write that to disk than to write the raw SAM file.

Regarding piping with picard, see this post: Piping Input Into Picard Sortsam . In short, just use /dev/stdin is the input. You could also samtools (e.g., bowtie2 ...stuff... | samtools view -Su - | samtools sort -T prefix -O BAM - > foo.sorted.bam), though it doesn't index during sorting like picard does, which is too bad.

ADD REPLYlink written 3.4 years ago by Devon Ryan85k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1556 users visited in the last hour