Question: Faster way to get coverage per base
1
gravatar for finswimmer
22 months ago by
finswimmer8.8k
Germany
finswimmer8.8k wrote:

Hello,

I would like to get the coverage for every single base in all regions in a bed file. For this I use samtools now

samtools depth -aa -b regions.bed input.bam

This all works as excpected. But the more regions I have the longer this tasks takes of course. I can speed it up by dividing the bed file and start parallel processes. But it still needs about 45min to complete. My whole analyse pipeline from fastq to an annotated vcf file runs only 1h for this data set.

So I wonder if there is a faster way for this task?

fin swimmer

sam alignment • 997 views
ADD COMMENTlink modified 28 days ago • written 22 months ago by finswimmer8.8k

Hi , If you have enough resources you can split your bam for example by chromosome and run samtools for each splited bam ?

ADD REPLYlink written 22 months ago by Titus770

Should there be any reasonable differences to split the bam file and not the bed file with the regions of interests?

I thought with the bam index it should be possible to have random access directly to the regions?

fin swimmer

ADD REPLYlink written 22 months ago by finswimmer8.8k

The BBMap package has a tool "pileup.sh" which is quite fast, and produces per-base coverage from an unsorted sam or bam file. I'm not sure if this answers your question, though, as it's not clear to me whether you want bed as input or output, or need bed at all.

ADD REPLYlink written 22 months ago by Brian Bushnell16k

pileup.sh seems to need lot of memory which I have to little (around 6GB) :(

The bed file is just needed to truncate the bam file to the region of interest. If I can pipe to pileup I can do this before with samtools view. So can I? Didn't find it in the documentation.

fin swimmer

ADD REPLYlink written 22 months ago by finswimmer8.8k

May be i didn't explain well my idea. You can split your bed and run multiple samtools in the same times. It will divide the calculation time in a lot little job ... Well you should try i m not sure about that but it make sens for me.

ADD REPLYlink written 22 months ago by Titus770

Hello Titus, this is what I meant with

I can speed it up by dividing the bed file and start parallel processes. But it still needs about 45min to complete.

:)

ADD REPLYlink written 21 months ago by finswimmer8.8k

sorry my bad i read too fast :)

ADD REPLYlink written 21 months ago by Titus770

Pileup will only allocate memory for chromosomes that actually have reads mapped to them, so yes.

ADD REPLYlink written 21 months ago by Brian Bushnell16k
4
gravatar for Jeffin Rockey
22 months ago by
Jeffin Rockey1000
Karimannoor
Jeffin Rockey1000 wrote:

Try bedtools coverage with -d option (link)

(And do notice the first note on -sorted)

ADD COMMENTlink modified 22 months ago • written 22 months ago by Jeffin Rockey1000

I'used bedtools before, but put it aside because of high memory usage (I just have have 6GB). But with the -sorted parameter it seemes to work at least with my small test dataset. I will will check it today on my large dataset.

Thank you.

fin swimmer

ADD REPLYlink written 22 months ago by finswimmer8.8k
1

Could test it yesterday and it works with the -sorted option much faster than with samtools.

Thanks a lot.

fin swimmer

ADD REPLYlink written 21 months ago by finswimmer8.8k
2
gravatar for finswimmer
28 days ago by
finswimmer8.8k
Germany
finswimmer8.8k wrote:

To add in up-to-date solution:

mosdepth is great:

$ mosdepth --by capture.bed sample-output sample.exome.bam

fin swimmer

ADD COMMENTlink modified 28 days ago • written 28 days ago by finswimmer8.8k
1
gravatar for Vivek Todur
22 months ago by
Vivek Todur50
Bangalore
Vivek Todur50 wrote:

Sambamba is high performance tool, similar to samtools. Here depth is multi threaded, you can take advantage of it...

Cheers...!

ADD COMMENTlink written 22 months ago by Vivek Todur50

Never heard about sambamba before, but I will try it.

fin swimmer

ADD REPLYlink written 22 months ago by finswimmer8.8k

Did you get chance to try sambamba..? BTW latest version of samtools (1.4) got released with multi thread option... Just check out out...

ADD REPLYlink written 21 months ago by Vivek Todur50

I didn't try sambamba until now.

But I recognize that picards CollectHsMetrics have an PER_BASE_COVERAGE option. As I'm running CollectHsMetrics in any case I could use this option.

fin swimmer

ADD REPLYlink written 21 months ago by finswimmer8.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1627 users visited in the last hour