read depth using samtools
1
2
Entering edit mode
3.9 years ago
LimMo ▴ 20

Hello all,

I'm using samtools in order to get the depth of a file called file.bam, but the output I have is like this:

samtools depth file.bam > output.txt

11  88911833    2
11  88911834    2
11  88911835    2

I understand that the first column is the name of the reference sequence, the second column is the base index within the reference, and the third column is the depth of coverage for that base.

But I need at the end the read depth represented like 100X, 150X and something like this.

Is there any calculation should I follow/apply to the output file?

depth samtools • 23k views
ADD COMMENT
0
Entering edit mode

I think you want the average depth of your sequencing data, to do that you can simply take the distribution of individual bases depths and compute the mean.

ADD REPLY
13
Entering edit mode
3.9 years ago

You can calculate the average coverage (for covered bases):

samtools depth  *bamfile*  |  awk '{sum+=$3} END { print "Average = ",sum/NR}'

This would be average coverage for all covered regions. If you want to include regions that were not covered in this calculation you need ot use: samtools depth -a

To get your average X coverage you would need to divide by the total size of your genome, instead of dividing by NR in the command above.

The total size can be calculated like this:

samtools view -H *bamfile* | grep -P '^@SQ' | cut -f 3 -d ':' | awk '{sum+=$1} END {print sum}'

Then you would redo the calculation substituting this in for NR.

I stole all of this from: Tools To Calculate Average Coverage For A Bam File?

ADD COMMENT
0
Entering edit mode

Thanks a lot for your help, I will try that :)

ADD REPLY
0
Entering edit mode

Hello, if I get from this calculation, let's say 1.7554 coverage, accounting for 0 depth reads. And 23 for non 0 reads. The coverage then (to represent as X) should be 1.7554*100=175.54 X?

ADD REPLY
0
Entering edit mode

I have the same question here. I'm getting result between 1.6-1.8 with this method. Does it mean my BAM files have this much low average X coverages? It's not supposed to be this low.

ADD REPLY
0
Entering edit mode

You need to divide by the total number of bases you expect to be covering with your sequencing reads to get X.

ADD REPLY
0
Entering edit mode

Okay. Then I guess this method works for WGS BAM files only. Because the genome size the second command line retrieves is from the reference sequence genome size, and WES BAM files only carry exonic sequences, not the whole genome like the reference sequence.

ADD REPLY
2
Entering edit mode

It will work with any type of sequencing. By default if you follow the answer line by line, the sum will take into account all bases covered in your BAM file. If you did WES, this is likely going to include off-target sites so it will probably deflate your average coverage value. You need to manually replace NR (as described below in the code block) with the total number of bases covered by your WES kit or more ideally provide -b to the samtools depth command and give it a BED file of your WES target regions.

ADD REPLY
0
Entering edit mode

I downloaded a Regions.bed file from the Agilent Suredesign site and used it to estimate coverage by BEDtools. I didn't know about the "samtools depth -b BEDfile.bed" option so thank you very much! The coverage values in SAMtools with a BED file are roughly 9-10 points higher than the results from BEDtools (coverage around 50-60). So this is the actual average X coverage I believe. Again thanks a bunch!

ADD REPLY

Login before adding your answer.

Traffic: 2713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6