Faster (perhaps random) access to BAM files (for collecting statistics like average read depth)
3
0
Entering edit mode
6.5 years ago
rightmirem ▴ 70

The title is not very clear. This is kind of two questions...

Basically, I want to do things like discover the statistical information from a BAM (not SAM) file (so it's compressed. Part of the problem).

  1. The immediate need is; I'd like to get an average depth for the entire file.

  2. The larger question is...

To get #1, I used...

samtools depth my.bam | while read A B C; do
    <tally, and sum, and average stuff>
done
echo <results>

It took FOREVER!!

I was thinking I don't need to average EVERY line. Even if I took every 1000th or 10,000th line, I will get a good enough estimate.

BUT, the issue is - since I must first run the BAMfile through samtools view using things like awk, sed, or even the file pointer to try and pull JUST the 1000th line actually takes longer than the above.

Is there a way around this?

Thanks!

EDIT

I'm vaguely familiar with samtools stats, but it didn't seem to have the info I was looking (specifically) above. If there's a way to tweak it to get what I need...that's awesome.

SNP SAM BAM • 1.9k views
ADD COMMENT
0
Entering edit mode

Have you looked at mosdepth?

ADD REPLY
2
Entering edit mode
6.5 years ago
GenoMax 142k

Take a look at pileup.sh from BBMap suite. It can calculate all sorts of stats from BAM files. If you have the pigz library installed it supports parallel gunzip which speeds things up.

ADD COMMENT
2
Entering edit mode
6.5 years ago
yhoogstrate ▴ 140

If your indexes are in place you might consider parsing the output of samtools idxstats.

ADD COMMENT
2
Entering edit mode
6.5 years ago

sambamba depth is parallelized and quicker, set it to check 1mbp etc windows. That will be quick. Otherwise one of the picard tools might help.

ADD COMMENT

Login before adding your answer.

Traffic: 2504 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6