Does anyone know of an efficient way to calculate error-rate (% bases that mis-match to the reference) from a bam, in a subset of the genome defined by a bed file? By efficient I was hoping for a c/htslib based method. Thank you!
igvtools will allow you to get a count of bases present at each site from a BAM file. You can subset your BAM before doing that.
as far as I can see igvtools count --bases counts the occurrence of each base per region. Are you suggesting to then compare to the reference base with another command?
igvtools count --bases
Do you want to calculate the sequencing error rate, or naturally occurring polymorphisms (SNPs, indels, and so on)?
The sequencing error rate. I don't mind polymorphisms being included in the calculation though.
looks like the 5th column of samtools mpileup -f hg38.fa the.bam could be used or this: https://github.com/genome/bam-readcount
samtools mpileup -f hg38.fa the.bam
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy