Error-rate of a subset of the genome
1
0
Entering edit mode
7 months ago
skanterakis ▴ 120

Does anyone know of an efficient way to calculate error-rate (% bases that mis-match to the reference) from a bam, in a subset of the genome defined by a bed file? By efficient I was hoping for a c/htslib based method. Thank you!

WGS error bam bed • 285 views
0
Entering edit mode

igvtools will allow you to get a count of bases present at each site from a BAM file. You can subset your BAM before doing that.

0
Entering edit mode

as far as I can see igvtools count --bases counts the occurrence of each base per region. Are you suggesting to then compare to the reference base with another command?

0
Entering edit mode

Do you want to calculate the sequencing error rate, or naturally occurring polymorphisms (SNPs, indels, and so on)?

0
Entering edit mode

The sequencing error rate. I don't mind polymorphisms being included in the calculation though.

0
Entering edit mode
7 months ago
skanterakis ▴ 120

looks like the 5th column of samtools mpileup -f hg38.fa the.bam could be used or this: https://github.com/genome/bam-readcount