Question: What Do You Expect As The False Positive And Negative Rate For Snp'S And Indels In Wgs?
gravatar for William
5.6 years ago by
William4.4k wrote:

What do you expect as the false positive and negative rate for SNP's and INDELS in a WGS experiment?

On which papers and data sets (inhouse or external) do you base this?

Edit: Of course this depends on a lot of thing as mentioned in the comment below. So let's assume a very vanilla situation: Human genome or popular model organism with a relative good assembly, relative close sample, maybe excluding difficult to sequence / map regions, Ilumina 100 x 100 reads, sequenced 30 x, mapped with BWA, SNP INDEL called with GATK.

indel genotype snp qualitycontrol • 3.0k views
ADD COMMENTlink modified 5.6 years ago by Vitis1.9k • written 5.6 years ago by William4.4k

Won't the answer depend heavily on species, genetic closeness to the reference sequence, type of sequencing, depth of sequencing, SNP/indel filtering, SNP/indel calling method, and probably a few more things that I haven't listed?

ADD REPLYlink written 5.6 years ago by Devon Ryan88k
gravatar for Istvan Albert
5.6 years ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

See Brad Chapmans's blog, Blue Collar Bioinformatics for articles like this one:

Framework for evaluating variant detection methods: comparison of aligners and callers

There is an entire series on this subject.

ADD COMMENTlink written 5.6 years ago by Istvan Albert ♦♦ 79k

Again I can't seem to find the exact method they used to calculate the concordance between indels.

ADD REPLYlink written 5.6 years ago by William4.4k
gravatar for Vitis
5.6 years ago by
New York
Vitis1.9k wrote:

O'Rawe et al. 2013 is an excellent paper describing the concordance of indel calls from different variant callers. I think in it there are some details about how they 'aligned' the indel calls to make them comparable, because it is usually not trivial to correct and compare indel start/end sites from different callers. Also, the main message of the paper is that the concordance is low, which means different callers usually have their unique indel calls, and the overlap among callers are not as good as we would like to see. The link to the paper is:

ADD COMMENTlink written 5.6 years ago by Vitis1.9k

"For indel calls, initial agreement between SOAPindel, SAMtools and GATK was very low at 3.0% (see Additional file 1, Figure S8). Indel coordinates were subsequently left-normalized and intervalized using a total range of 20 genomic coordinates (10 bp in each direction of their genomic coordinates)"

ADD REPLYlink written 5.6 years ago by William4.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1242 users visited in the last hour