Question: Is Pindel slow for everyone, or should I review my command?
Hi all,

It has been almost 1 month since I started five pindel2vcf runs to convert the output of Pindel, which took more than 1 month itself to finish. I am using it on whole-genome results, so the amount of data is considerably high. However, I did not find anywhere written that the program is not meant to be used for whole-genome analyses.

My command was:

time { pindel -f reference.fa --config-file filename.config --output-prefix whatever --chromosome ALL --number_of_threads 12 --max_range_index 4 --report_inversions --report_duplications --report_long_insertions --report_breakpoints --report_close_mapped_reads --min_inversion_size 50 &> STDERR/pindel.stderr; } &> TIME/pindel.time & disown

Except for the 12 threads (couldn't help it), is it improvable in speed by adding something that I am not aware of? Am I wrong on using it for whole-genome analyses?

My pindel2vcf command was:

time { pindel2vcf -p pindel_output_file -r reference.fa -R name_and_version -d date -v FINAL/deletions.vcf -mc 10 -he 0.2 -ho 0.8 --both_strands_supported --min_supporting_reads 4 --max_supporting_reads 50 &> STDERR/deletions.vcf.stderr; } &> TIME/deletions.vcf.time & disown

Is this also improvable?

How large is the pindel output file? Does your CPU support at least 12 parallel threads? Did you have free RAM at all times? If you processed the chromosomes individually, then you could have pindel2vcf'd output files in parallel. I've never used pindel so I can't really comment about your command line arguments..

The resources are not a problem, I'm working on a quite big cluster with many cores and a lot of memory always available, I think the problem is more related to Pindel itself.

