What Do You Expect As The False Positive And Negative Rate For Snp'S And Indels In Wgs?
2
0
Entering edit mode
8.4 years ago
William ★ 5.0k

What do you expect as the false positive and negative rate for SNP's and INDELS in a WGS experiment?

On which papers and data sets (inhouse or external) do you base this?

Edit: Of course this depends on a lot of thing as mentioned in the comment below. So let's assume a very vanilla situation: Human genome or popular model organism with a relative good assembly, relative close sample, maybe excluding difficult to sequence / map regions, Ilumina 100 x 100 reads, sequenced 30 x, mapped with BWA, SNP INDEL called with GATK.

qualitycontrol snp indel genotype • 3.9k views
ADD COMMENT
1
Entering edit mode

Won't the answer depend heavily on species, genetic closeness to the reference sequence, type of sequencing, depth of sequencing, SNP/indel filtering, SNP/indel calling method, and probably a few more things that I haven't listed?

ADD REPLY
3
Entering edit mode
8.4 years ago

See Brad Chapmans's blog, Blue Collar Bioinformatics for articles like this one:

Framework for evaluating variant detection methods: comparison of aligners and callers

There is an entire series on this subject.

ADD COMMENT
0
Entering edit mode

Again I can't seem to find the exact method they used to calculate the concordance between indels.

ADD REPLY
2
Entering edit mode
8.4 years ago
Vitis ★ 2.5k

O'Rawe et al. 2013 is an excellent paper describing the concordance of indel calls from different variant callers. I think in it there are some details about how they 'aligned' the indel calls to make them comparable, because it is usually not trivial to correct and compare indel start/end sites from different callers. Also, the main message of the paper is that the concordance is low, which means different callers usually have their unique indel calls, and the overlap among callers are not as good as we would like to see. The link to the paper is: http://genomemedicine.com/content/5/3/28

ADD COMMENT
0
Entering edit mode

"For indel calls, initial agreement between SOAPindel, SAMtools and GATK was very low at 3.0% (see Additional file 1, Figure S8). Indel coordinates were subsequently left-normalized and intervalized using a total range of 20 genomic coordinates (10 bp in each direction of their genomic coordinates)"

ADD REPLY

Login before adding your answer.

Traffic: 3337 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6