Question: Depth of Coverage without Read Overlap
gravatar for drkennetz
23 months ago by
drkennetz500 wrote:

Hi guys, thanks in advance for help.

I currently calculate the on target depth of coverage for a sample by using samtools depth:

samtools depth -a -b bedfile.bed dupmrked_sorted_bamfile.bam > output_bamdepth.txt

This outputs all bases' depth (even those with 0 coverage) within the regions in the bedfile, excluding duplicates.

To get a mean value I just run:

awk '{s+=$3}END{print s/NR}' output_bamdepth.txt

Which sums the depth from the samtools output and divides by the number of covered bases which would give me an average depth.

My issue is that the input reads are paired end reads. By the time we sort and mark duplicates with picard, the final bam file is in single read format, and the read could be forward or reverse. This means that a position may be covered by both the forward and reverse read and therefore may be double counted sometimes, and sometimes it may not.

This is just as much a biological problem as it is a computer problem. We have not tried to optimize for shearing size in comparison to read length which would end up providing us with a sweet spot.

If we have a fragment for sequencing that is 250BP and we do paired end 150 Sequencing, then R1 will hit some of the read, and R2 will hit some of the read and then some of the read will be overlapped by R1 and R2 (from the same read pair). This overlap region will give false double coverage. Does anyone have a way for overcoming this?

Thanks, Dennis

next-gen • 776 views
ADD COMMENTlink modified 23 months ago by Vitis2.4k • written 23 months ago by drkennetz500

You could merge-overlap the paired end reads (eg with FLASH, BBmerge, ...) and then use the merged reads and process them as single end

ADD REPLYlink written 23 months ago by lieven.sterck8.7k

I appreciate that, but I am not looking to modify the reads or generate new data, I think this can be done using bam header information but I am not sure just how.

ADD REPLYlink written 23 months ago by drkennetz500
gravatar for Vitis
23 months ago by
New York
Vitis2.4k wrote:

According to this post

Samtools Mpileup And Overlapping Paired-End Reads

"samtools mpileup", after version 1.X, doesn't double-count the overlapping read pairs. I don't know the default behavior of "samtools depth", but maybe a small test could be set up to see.

ADD COMMENTlink written 23 months ago by Vitis2.4k

Thank you for this! It is definitely something I will look into. I'll accept this if I find documentation similar for depth, or figure out how to do the same type of analysis using mpileup.

ADD REPLYlink written 23 months ago by drkennetz500
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1673 users visited in the last hour