Entering edit mode
8.2 years ago
tonja.r
▴
600
I would like to use bedtools as it seems to have much more options for the input/output file. However, it seems that bedtools is way slower than the samtools.
I have: multiple bed files (sorted and indexed) and multiple regions and I want a coverage per base. gene_coord_red.bed
has only one region. I ran bedtools coverageBed
following:
time coverageBed -a gene_coord_red.bed -b reads.sort.bam -d > bedtools.txt
real 2m17.861s
user 1m57.572s
sys 0m19.926s
I ran samtools mpileup
:
time samtools mpileup -Q 0 -l gene_coord_red.bed reads.sort.bam > mpileup.txt
real 0m23.595s
user 0m22.808s
sys 0m0.301s
Why is the bedtools running so slow? Can I accelerate it somehow?
Bedtools usually has a
-sorted
option or something like that tends to speed things up when the BED/BAM files are sorted.It has. However, if I sort my bam file with
samtools sort
, bedtools says:If I convert .bam file into .bed and them run bedtools with
-sort
, then the time is 1m11,249 what is still much comparing with the mpileupI'm not sure that bedtools takes advantage of being able to randomly jump around in the BAM file to get the alignments intersecting each entry in the BED file. This would explain why you have similar times for BAM and BED files, since the latter also don't allow random access (well, without using something like tabix, but bedtools isn't using that).
Hello tonja.rand!
It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=62741
This is typically not recommended as it runs the risk of annoying people in both communities.