Question: Faster way to get coverage per base
1
gravatar for finswimmer
20 months ago by
finswimmer7.0k
Germany
finswimmer7.0k wrote:

Hello,

I would like to get the coverage for every single base in all regions in a bed file. For this I use samtools now

samtools depth -aa -b regions.bed input bam

This all works as excpected. But the more regions I have the longer this tasks takes of course. I can speed it up by dividing the bed file and start parallel processes. But it still needs about 45min to complete. My whole analyse pipeline from fastq to an annotated vcf file runs only 1h for this data set.

So I wonder if there is a faster way for this task?

fin swimmer

sam alignment • 859 views
ADD COMMENTlink modified 20 months ago by Vivek Todur50 • written 20 months ago by finswimmer7.0k

Hi , If you have enough resources you can split your bam for example by chromosome and run samtools for each splited bam ?

ADD REPLYlink written 20 months ago by Titus740

Should there be any reasonable differences to split the bam file and not the bed file with the regions of interests?

I thought with the bam index it should be possible to have random access directly to the regions?

fin swimmer

ADD REPLYlink written 20 months ago by finswimmer7.0k

The BBMap package has a tool "pileup.sh" which is quite fast, and produces per-base coverage from an unsorted sam or bam file. I'm not sure if this answers your question, though, as it's not clear to me whether you want bed as input or output, or need bed at all.

ADD REPLYlink written 20 months ago by Brian Bushnell16k

pileup.sh seems to need lot of memory which I have to little (around 6GB) :(

The bed file is just needed to truncate the bam file to the region of interest. If I can pipe to pileup I can do this before with samtools view. So can I? Didn't find it in the documentation.

fin swimmer

ADD REPLYlink written 20 months ago by finswimmer7.0k

May be i didn't explain well my idea. You can split your bed and run multiple samtools in the same times. It will divide the calculation time in a lot little job ... Well you should try i m not sure about that but it make sens for me.

ADD REPLYlink written 20 months ago by Titus740

Hello Titus, this is what I meant with

I can speed it up by dividing the bed file and start parallel processes. But it still needs about 45min to complete.

:)

ADD REPLYlink written 20 months ago by finswimmer7.0k

sorry my bad i read too fast :)

ADD REPLYlink written 20 months ago by Titus740

Pileup will only allocate memory for chromosomes that actually have reads mapped to them, so yes.

ADD REPLYlink written 20 months ago by Brian Bushnell16k
3
gravatar for Jeffin Rockey
20 months ago by
Jeffin Rockey760
Karimannoor
Jeffin Rockey760 wrote:

Try bedtools coverage with -d option (link)

(And do notice the first note on -sorted)

ADD COMMENTlink modified 20 months ago • written 20 months ago by Jeffin Rockey760

I'used bedtools before, but put it aside because of high memory usage (I just have have 6GB). But with the -sorted parameter it seemes to work at least with my small test dataset. I will will check it today on my large dataset.

Thank you.

fin swimmer

ADD REPLYlink written 20 months ago by finswimmer7.0k
1

Could test it yesterday and it works with the -sorted option much faster than with samtools.

Thanks a lot.

fin swimmer

ADD REPLYlink written 20 months ago by finswimmer7.0k
1
gravatar for Vivek Todur
20 months ago by
Vivek Todur50
Bangalore
Vivek Todur50 wrote:

Sambamba is high performance tool, similar to samtools. Here depth is multi threaded, you can take advantage of it...

Cheers...!

ADD COMMENTlink written 20 months ago by Vivek Todur50

Never heard about sambamba before, but I will try it.

fin swimmer

ADD REPLYlink written 20 months ago by finswimmer7.0k

Did you get chance to try sambamba..? BTW latest version of samtools (1.4) got released with multi thread option... Just check out out...

ADD REPLYlink written 19 months ago by Vivek Todur50

I didn't try sambamba until now.

But I recognize that picards CollectHsMetrics have an PER_BASE_COVERAGE option. As I'm running CollectHsMetrics in any case I could use this option.

fin swimmer

ADD REPLYlink written 19 months ago by finswimmer7.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1305 users visited in the last hour