Question

Pindel too slow. How to speedup

5

Entering edit mode

9.5 years ago

chris ▴ 50

Hi,

I aligned five 5x mate-pair libraries with different insert sizes to a 300M genome and started Pindel without the dispersed duplicates option on all BAM files. It's running for 10 days now on 30 cores and processeced around half of the genome.

Is there a way to speed up Pindel? E.g. de-duplication? Is that the expected runtime?

I read about processing the chromosomes individually. How do you deal with interchromosomal duplication in that case?

chris

pindel • 3.5k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by chris ▴ 50

1

Entering edit mode

I have the same problem and would be interested in a solution.

ADD REPLY • link 9.5 years ago by Christian ★ 3.0k

1

Entering edit mode

Same here.

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 9.5 years ago by iraun 6.2k

Ram · Answer 1 · 2014-11-19

Newest Pindel code? You could split by smaller segments (say 10MB) per job with 4 cores and this will not affect interchr prediction.

If on exome data, I sometimes saw coverage spikes (several kx) in several narrow regions, this causes Pindel to slow down and have memory issue. You might use -J to exclude those regions. I have tried to solve this but have not found a smart way to handle huge coverage variation in the data.

kai

Ram · Answer 2 · 2014-11-19

If you are interested in indels, I suggest trying BBMap. It allows alignment across long indels so you can detect them in a simple mapping+pileup rather than using an expensive realignment. It also allows an arbitrarily large number of indels or substitutions per reads, so has very high sensitivity to multiple events.

Note - while it can detect deletions of arbitrary length (depending on the maxindel flag), it cannot detect insertions more than ~50% of read length, since it only uses information from individual reads. So it's not a complete replacement for Pindel.