Parallelizing SAMTOOLS: how to make my command run faster?
2
0
Entering edit mode
7.8 years ago
SOHAIL ▴ 400

Hi Everyone,

I am trying to list the tags within my BAM file using following command:

 samtools view input.bam | cut -f 12- | tr '\t' '\n' | cut -d ':' -f 1 | awk '{ if(!x[$1]++) { print }}'

But it's taking so long for larger BAMs to output the tag names. I have more than 30 BAMs for different WGS samples. Is there any way to optimize my command-line to run it faster?

I have cluster available and we use PBS as job scheduler software.

I want my command to run fast. please help!

Thanks!

next-gen sequencing samtools • 4.0k views
ADD COMMENT
0
Entering edit mode

for i in *.bam ; do echo "samtools view $i | cut -f 12- | tr '\t' '\n' | cut -d ':' -f 1 | awk '{ if(!x[$1]++) { print }} > $i.out'" | qsub ; done ;

ADD REPLY
1
Entering edit mode
7.8 years ago
Steven Lakin ★ 1.8k

Although the bugs are still being worked out, you might consider using Sambamba instead. It was made to be a faster version of Samtools.

ADD COMMENT
0
Entering edit mode

Any other solution by using samtools?

ADD REPLY
0
Entering edit mode

I/O is generally not able to be parallelized; unless you're using an SSD, your hard drive can only read in data so fast due to mechanical limitations in the hardware. Sambamba achieves speed-ups over Samtools by optimizing caching methods, but it is still I/O bound.

ADD REPLY
0
Entering edit mode
7.8 years ago
SOHAIL ▴ 400

Any other solution? by using samtools?

ADD COMMENT
0
Entering edit mode

You just want to list the different possible tag names? Like this: A: RGID mismatch after using MergeBamAlignment

If so, you can do this in SeQC pretty quick. The reading of a single BAM file is not parallelized, but processing multiple BAMs at once is (because disk IO is always the bottleneck here).

ADD REPLY

Login before adding your answer.

Traffic: 2734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6