Genozip calculates coverage a bit differently, based on the number of bases rather than reads - ignoring hard-clipped and soft-clipped bases, as well as unmapped, secondary, supplementary and duplicate alignments. It can also calculate approximate coverage directly on a FASTQ file.
 $ genocat --coverage my-sample.bam.genozip
--coverage for: my-sample.bam.genozip
Contig         LN        Reads        Bases       Bases   Depth
1              249.3 Mb  61,215,549   9.0 Gb      7.1 %   36.10
2              243.2 Mb  63,342,488   9.3 Gb      7.3 %   38.26
3              198.0 Mb  50,819,357   7.5 Gb      5.9 %   37.76
4              191.2 Mb  48,131,726   7.1 Gb      5.6 %   37.04
5              180.9 Mb  46,495,225   6.8 Gb      5.4 %   37.81
6              171.1 Mb  43,659,445   6.4 Gb      5.0 %   37.53
7              159.1 Mb  41,171,496   6.0 Gb      4.7 %   38.01
8              146.4 Mb  38,081,676   5.6 Gb      4.4 %   38.27
9              141.2 Mb  31,755,198   4.7 Gb      3.7 %   33.05
10             135.5 Mb  34,954,121   5.1 Gb      4.0 %   37.90
11             135.0 Mb  35,234,129   5.2 Gb      4.1 %   38.38
12             133.9 Mb  34,523,219   5.1 Gb      4.0 %   37.91
13             115.2 Mb  24,526,664   3.6 Gb      2.8 %   31.32
14             107.3 Mb  23,525,557   3.5 Gb      2.7 %   32.21
15             102.5 Mb  22,613,925   3.3 Gb      2.6 %   32.41
16             90.4 Mb   23,675,707   3.5 Gb      2.7 %   38.41
17             81.2 Mb   22,148,439   3.2 Gb      2.5 %   40.01
18             78.1 Mb   19,542,517   2.9 Gb      2.3 %   36.81
19             59.1 Mb   16,374,766   2.4 Gb      1.9 %   40.50
20             63.0 Mb   16,530,922   2.4 Gb      1.9 %   38.52
21             48.1 Mb   9,811,790    1.4 Gb      1.1 %   29.93
22             51.3 Mb   10,238,901   1.5 Gb      1.2 %   29.26
X              155.3 Mb  20,421,365   3.0 Gb      2.3 %   19.29
Y              59.4 Mb   3,971,007    582.0 Mb    0.5 %    9.80
MT             16.6 Kb   105,906      15.5 Mb     0.0 %  937.41
Other contigs            28,989,258   4.1 Gb      3.2 %
-----
All contigs              771,860,353  113.3 Gb    88.8%   36.60
Soft clip                             1.2 Gb      0.9 %
Unmapped                 17,071,958   2.4 Gb      1.9 %
Supplementary            1,028,655    52.2 Mb     0.0 %
Duplicate                71,068,615   10.6 Gb     8.3 %
TOTAL                    861,029,581  127.5 Gb    100.0%
Documentation: https://genozip.com/coverage.html
Installing: https://genozip.com/installing.html
Publication: https://www.researchgate.net/publication/349347156_Genozip_-_A_Universal_Extensible_Genomic_Data_Compressor
                    
                
                 
People seem to be equally devided between GATK, Samtools and BedTools.
The Depth of Coverage is depreciated in GATK4. It's only available in GATK3. See here
DepthOfCoverage is available as beta https://gatk.broadinstitute.org/hc/en-us/articles/360041851491-DepthOfCoverage-BETA-#--intervals
This is what I ended up using, after trying bedtools
Most answers seems to be very old and hence would like to have updated suggestions.
I have 6 bam files and I have used
samtools depthto calculate chromosome wise depth for all chromosomes and then plotted in R. While looking at the plots at 2-3 places, depth shows upto 200-3500 and hence I would like to calculate average read depth of each chromosome from each bam file. For e.g.: 1) average depth of chr1 from bam1, average depth of chr1 from bam2, ...until bam6. 2) average depth of chr1 from all 6 bam together.Kindly guide.
Thanks.
Try the solutions if they don't work then you need to reframe the question with your commands and write a new query in a new post. This is advised. And yes paired end should work.
in addition, if you have already plotted depths it means that you have got a temporal R data.frame (or similar type) in there containing chromosome, position and depth information. splitting such data.frame by chromosome and calculating average values should be straight-forward. I would just suggest to take into account empty values not reported by samtools depth dividing each chromosome depths' sum by the size of the chromosome, and not by the number of depth values.
Why is it wrong to take the number of sequenced bases and divide by the genome size?
because you cannot align all the bases on the genome, some reads are optical duplicates, etc...
Okay, contamination is a fair point, but even if some reads fail to match, that could still be a sign of genetic divergence, and should still count towards genome coverage. Do any of the other suggestions deal with duplicates implicitly?
Qualimap is very fast and excellent tool. thanks
Hi,
I have multiple paired-end bam files from RNA-Seq data, already aligned and computed depth with
samtools depth > bam.depth. Would any one please guide for any straight forward/latest way to compute coverage plots, avg. coverage plots and other metrics from depth files?Picard CollectRNAseqmetrics,RSeQCandQoRTshave not been helpful. :-(Thanks.
Have you run any of the other tools mentioned in this thread (e.g. Qualimap, pileup from BBMap)?
@genomax My bam files are very big and
Qualimaphas not been helpful, so asbbmapwhich is Java-based (memory issues likePicardandQoRTs) and hence I usedsamtools depth. My depth files are also in GB.Both programs will happily accept larger allocations of RAM and can open very large files. If you don't have access to a machine with more RAM then that is a different problem.