bedtools coverageBed vs samtools mpileup, speed
0
1
Entering edit mode
7.2 years ago
tonja.r ▴ 600

I would like to use bedtools as it seems to have much more options for the input/output file. However, it seems that bedtools is way slower than the samtools.

I have: multiple bed files (sorted and indexed) and multiple regions and I want a coverage per base. gene_coord_red.bed has only one region. I ran bedtools coverageBed following:

time coverageBed -a gene_coord_red.bed -b reads.sort.bam -d > bedtools.txt
real    2m17.861s
user    1m57.572s
sys    0m19.926s


I ran samtools mpileup:

time samtools mpileup -Q 0 -l gene_coord_red.bed reads.sort.bam > mpileup.txt
real    0m23.595s
user    0m22.808s
sys    0m0.301s


​Why is the bedtools running so slow? Can I accelerate it somehow?

sequencing • 3.2k views
0
Entering edit mode

Bedtools usually has a -sorted option or something like that tends to speed things up when the BED/BAM files are sorted.

0
Entering edit mode

It has. However, if I sort my bam file with samtools sort, bedtools says:

ERROR: Sort order was unspecified, and file sorted_out.bam is not sorted lexicographically.


If I convert .bam file into .bed and them run bedtools with -sort, then the time is 1m11,249 what is still much comparing with the mpileup

0
Entering edit mode

I'm not sure that bedtools takes advantage of being able to randomly jump around in the BAM file to get the alignments intersecting each entry in the BED file. This would explain why you have similar times for BAM and BED files, since the latter also don't allow random access (well, without using something like tabix, but bedtools isn't using that).

0
Entering edit mode

Hello tonja.rand!

This is typically not recommended as it runs the risk of annoying people in both communities.