Question

What Do You Expect As The False Positive And Negative Rate For Snp'S And Indels In Wgs?

0

Entering edit mode

10.7 years ago

William ★ 5.3k

What do you expect as the false positive and negative rate for SNP's and INDELS in a WGS experiment?

On which papers and data sets (inhouse or external) do you base this?

Edit: Of course this depends on a lot of thing as mentioned in the comment below. So let's assume a very vanilla situation: Human genome or popular model organism with a relative good assembly, relative close sample, maybe excluding difficult to sequence / map regions, Ilumina 100 x 100 reads, sequenced 30 x, mapped with BWA, SNP INDEL called with GATK.

qualitycontrol snp indel genotype • 4.5k views

ADD COMMENT • link updated 10.7 years ago by Vitis ★ 2.5k • written 10.7 years ago by William ★ 5.3k

1

Entering edit mode

Won't the answer depend heavily on species, genetic closeness to the reference sequence, type of sequencing, depth of sequencing, SNP/indel filtering, SNP/indel calling method, and probably a few more things that I haven't listed?

ADD REPLY • link 10.7 years ago by Devon Ryan 104k

score 3 · Answer 1 · 2013-07-25

3

Entering edit mode

10.7 years ago

Istvan Albert 100k

See Brad Chapmans's blog, Blue Collar Bioinformatics for articles like this one:

Framework for evaluating variant detection methods: comparison of aligners and callers

There is an entire series on this subject.

ADD COMMENT • link 10.7 years ago by Istvan Albert 100k

0

Entering edit mode

Again I can't seem to find the exact method they used to calculate the concordance between indels.

ADD REPLY • link 10.7 years ago by William ★ 5.3k

score 2 · Answer 2 · 2013-07-25

2

Entering edit mode

10.7 years ago

Vitis ★ 2.5k

O'Rawe et al. 2013 is an excellent paper describing the concordance of indel calls from different variant callers. I think in it there are some details about how they 'aligned' the indel calls to make them comparable, because it is usually not trivial to correct and compare indel start/end sites from different callers. Also, the main message of the paper is that the concordance is low, which means different callers usually have their unique indel calls, and the overlap among callers are not as good as we would like to see. The link to the paper is: http://genomemedicine.com/content/5/3/28

ADD COMMENT • link 10.7 years ago by Vitis ★ 2.5k

0

Entering edit mode

"For indel calls, initial agreement between SOAPindel, SAMtools and GATK was very low at 3.0% (see Additional file 1, Figure S8). Indel coordinates were subsequently left-normalized and intervalized using a total range of 20 genomic coordinates (10 bp in each direction of their genomic coordinates)"

ADD REPLY • link 10.7 years ago by William ★ 5.3k