Error-rate of a subset of the genome
1
0
Entering edit mode
7 months ago
skanterakis ▴ 120

Does anyone know of an efficient way to calculate error-rate (% bases that mis-match to the reference) from a bam, in a subset of the genome defined by a bed file? By efficient I was hoping for a c/htslib based method. Thank you!

WGS error bam bed • 285 views
ADD COMMENT
0
Entering edit mode

igvtools will allow you to get a count of bases present at each site from a BAM file. You can subset your BAM before doing that.

ADD REPLY
0
Entering edit mode

as far as I can see igvtools count --bases counts the occurrence of each base per region. Are you suggesting to then compare to the reference base with another command?

ADD REPLY
0
Entering edit mode

Do you want to calculate the sequencing error rate, or naturally occurring polymorphisms (SNPs, indels, and so on)?

ADD REPLY
0
Entering edit mode

The sequencing error rate. I don't mind polymorphisms being included in the calculation though.

ADD REPLY
0
Entering edit mode
7 months ago
skanterakis ▴ 120

looks like the 5th column of samtools mpileup -f hg38.fa the.bam could be used or this: https://github.com/genome/bam-readcount

ADD COMMENT

Login before adding your answer.

Traffic: 2108 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6