8.1 years ago by
Bergen, Norway
And the mandatory R solution. The Rsamtools package can be used to read in the BAM file.
Here is a code example:
library(Rsamtools)
bamcoverage <- function (bamfile) {
# read in the bam file
bam <- scanBam(bamfile)[[1]] # the result comes in nested lists
# filter reads without match position
ind <- ! is.na(bam$pos)
## remove non-matches, they are not relevant to us
bam <- lapply(bam, function(x) x[ind])
ranges <- IRanges(start=bam$pos, width=bam$qwidth, names=make.names(bam$qname, unique=TRUE))
## names of the bam data frame:
## "qname" "flag" "rname" "strand" "pos" "qwidth"
## "mapq" "cigar" "mrnm" "mpos" "isize" "seq" "qual"
## construc: genomic ranges object containing all reads
ranges <- GRanges(seqnames=Rle(bam$rname), ranges=ranges, strand=Rle(bam$strand), flag=bam$flag, readid=bam$rname )
## returns a coverage for each reference sequence (aka. chromosome) in the bam file
return (mean(coverage(ranges)))
}
And the output:
> bamcoverage("~/test.bam")
gi|225184640|emb|AL009126.3|
15.35720
The good thing about this is you can do much more with the genomic ranges object than computing the coverage, the downside is, that the code is not that efficient.
People seem to be equally devided between GATK, Samtools and BedTools.
Why is it wrong to take the number of sequenced bases and divide by the genome size?
because you cannot align all the bases on the genome, some reads are optical duplicates, etc...
Okay, contamination is a fair point, but even if some reads fail to match, that could still be a sign of genetic divergence, and should still count towards genome coverage. Do any of the other suggestions deal with duplicates implicitly?
Most answers seems to be very old and hence would like to have updated suggestions.
I have 6 bam files and I have used
samtools depth
to calculate chromosome wise depth for all chromosomes and then plotted in R. While looking at the plots at 2-3 places, depth shows upto 200-3500 and hence I would like to calculate average read depth of each chromosome from each bam file. For e.g.: 1) average depth of chr1 from bam1, average depth of chr1 from bam2, ...until bam6. 2) average depth of chr1 from all 6 bam together.Kindly guide.
Thanks.
Try the solutions if they don't work then you need to reframe the question with your commands and write a new query in a new post. This is advised. And yes paired end should work.
in addition, if you have already plotted depths it means that you have got a temporal R data.frame (or similar type) in there containing chromosome, position and depth information. splitting such data.frame by chromosome and calculating average values should be straight-forward. I would just suggest to take into account empty values not reported by samtools depth dividing each chromosome depths' sum by the size of the chromosome, and not by the number of depth values.
Qualimap is very fast and excellent tool. thanks
Hi,
I have multiple paired-end bam files from RNA-Seq data, already aligned and computed depth with
samtools depth > bam.depth
. Would any one please guide for any straight forward/latest way to compute coverage plots, avg. coverage plots and other metrics from depth files?Picard CollectRNAseqmetrics
,RSeQC
andQoRTs
have not been helpful. :-(Thanks.
Have you run any of the other tools mentioned in this thread (e.g. Qualimap, pileup from BBMap)?
@genomax My bam files are very big and
Qualimap
has not been helpful, so asbbmap
which is Java-based (memory issues likePicard
andQoRTs
) and hence I usedsamtools depth
. My depth files are also in GB.Both programs will happily accept larger allocations of RAM and can open very large files. If you don't have access to a machine with more RAM then that is a different problem.