Genozip calculates coverage a bit differently, based on the number of bases rather than reads - ignoring hard-clipped and soft-clipped bases, as well as unmapped, secondary, supplementary and duplicate alignments. It can also calculate approximate coverage directly on a FASTQ file.
$ genocat --coverage my-sample.bam.genozip
--coverage for: my-sample.bam.genozip
Contig LN Reads Bases Bases Depth
1 249.3 Mb 61,215,549 9.0 Gb 7.1 % 36.10
2 243.2 Mb 63,342,488 9.3 Gb 7.3 % 38.26
3 198.0 Mb 50,819,357 7.5 Gb 5.9 % 37.76
4 191.2 Mb 48,131,726 7.1 Gb 5.6 % 37.04
5 180.9 Mb 46,495,225 6.8 Gb 5.4 % 37.81
6 171.1 Mb 43,659,445 6.4 Gb 5.0 % 37.53
7 159.1 Mb 41,171,496 6.0 Gb 4.7 % 38.01
8 146.4 Mb 38,081,676 5.6 Gb 4.4 % 38.27
9 141.2 Mb 31,755,198 4.7 Gb 3.7 % 33.05
10 135.5 Mb 34,954,121 5.1 Gb 4.0 % 37.90
11 135.0 Mb 35,234,129 5.2 Gb 4.1 % 38.38
12 133.9 Mb 34,523,219 5.1 Gb 4.0 % 37.91
13 115.2 Mb 24,526,664 3.6 Gb 2.8 % 31.32
14 107.3 Mb 23,525,557 3.5 Gb 2.7 % 32.21
15 102.5 Mb 22,613,925 3.3 Gb 2.6 % 32.41
16 90.4 Mb 23,675,707 3.5 Gb 2.7 % 38.41
17 81.2 Mb 22,148,439 3.2 Gb 2.5 % 40.01
18 78.1 Mb 19,542,517 2.9 Gb 2.3 % 36.81
19 59.1 Mb 16,374,766 2.4 Gb 1.9 % 40.50
20 63.0 Mb 16,530,922 2.4 Gb 1.9 % 38.52
21 48.1 Mb 9,811,790 1.4 Gb 1.1 % 29.93
22 51.3 Mb 10,238,901 1.5 Gb 1.2 % 29.26
X 155.3 Mb 20,421,365 3.0 Gb 2.3 % 19.29
Y 59.4 Mb 3,971,007 582.0 Mb 0.5 % 9.80
MT 16.6 Kb 105,906 15.5 Mb 0.0 % 937.41
Other contigs 28,989,258 4.1 Gb 3.2 %
-----
All contigs 771,860,353 113.3 Gb 88.8% 36.60
Soft clip 1.2 Gb 0.9 %
Unmapped 17,071,958 2.4 Gb 1.9 %
Supplementary 1,028,655 52.2 Mb 0.0 %
Duplicate 71,068,615 10.6 Gb 8.3 %
TOTAL 861,029,581 127.5 Gb 100.0%
Documentation: https://genozip.com/coverage.html
Installing: https://genozip.com/installing.html
Publication: https://www.researchgate.net/publication/349347156_Genozip_-_A_Universal_Extensible_Genomic_Data_Compressor
People seem to be equally devided between GATK, Samtools and BedTools.
The Depth of Coverage is depreciated in GATK4. It's only available in GATK3. See here
DepthOfCoverage is available as beta https://gatk.broadinstitute.org/hc/en-us/articles/360041851491-DepthOfCoverage-BETA-#--intervals
This is what I ended up using, after trying bedtools
Most answers seems to be very old and hence would like to have updated suggestions.
I have 6 bam files and I have used
samtools depth
to calculate chromosome wise depth for all chromosomes and then plotted in R. While looking at the plots at 2-3 places, depth shows upto 200-3500 and hence I would like to calculate average read depth of each chromosome from each bam file. For e.g.: 1) average depth of chr1 from bam1, average depth of chr1 from bam2, ...until bam6. 2) average depth of chr1 from all 6 bam together.Kindly guide.
Thanks.
Try the solutions if they don't work then you need to reframe the question with your commands and write a new query in a new post. This is advised. And yes paired end should work.
in addition, if you have already plotted depths it means that you have got a temporal R data.frame (or similar type) in there containing chromosome, position and depth information. splitting such data.frame by chromosome and calculating average values should be straight-forward. I would just suggest to take into account empty values not reported by samtools depth dividing each chromosome depths' sum by the size of the chromosome, and not by the number of depth values.
Why is it wrong to take the number of sequenced bases and divide by the genome size?
because you cannot align all the bases on the genome, some reads are optical duplicates, etc...
Okay, contamination is a fair point, but even if some reads fail to match, that could still be a sign of genetic divergence, and should still count towards genome coverage. Do any of the other suggestions deal with duplicates implicitly?
Qualimap is very fast and excellent tool. thanks
Hi,
I have multiple paired-end bam files from RNA-Seq data, already aligned and computed depth with
samtools depth > bam.depth
. Would any one please guide for any straight forward/latest way to compute coverage plots, avg. coverage plots and other metrics from depth files?Picard CollectRNAseqmetrics
,RSeQC
andQoRTs
have not been helpful. :-(Thanks.
Have you run any of the other tools mentioned in this thread (e.g. Qualimap, pileup from BBMap)?
@genomax My bam files are very big and
Qualimap
has not been helpful, so asbbmap
which is Java-based (memory issues likePicard
andQoRTs
) and hence I usedsamtools depth
. My depth files are also in GB.Both programs will happily accept larger allocations of RAM and can open very large files. If you don't have access to a machine with more RAM then that is a different problem.