Strategy To Make Cutoff For A Variant Calling Experiment Of Ngs
1
0
Entering edit mode
13.2 years ago

We got our individuals (F1s, from crossing between reference genome and objective one) of a plant species sequenced by NGS method. Variants (snps and indels) were called for each of objective plant by these F1 individuals NGS data. Our data are haplotype data, a phased haplotype was called for one objective plant (one parent of the F1 individual).

For quality control, we employed several values: (1) concordance, (ratio of reads supporting a predicted feature to total coverage); (2) coverage, (how many reads supported this variant); (3) base quality, (base quality from the sequencing process).

Here, concordance may be the most important variable for quality control. The best variant calls determined by concordance are those have values of 0.5. Obviously, smaller ones (<0.1) and bigger ones (>0.9) are not good. Coverage may also play an important role, like the calls which have 0.5 vale for concordance and 0.1 coverage may not be the good calls. While, base quality may be the most intuitive quality control variable. The bigger base quality should be the calls which are better.

Here, I want to find a good strategy to set a cutoff to our variant calls based on these three or just concordance and coverage variables. I prefer a more statistical way.

Would you please give me any ideas/directions on my problems? Thanks in advance.

quality next-gen sequencing • 3.7k views
ADD COMMENT
0
Entering edit mode

You want to call variants or haplotypes? Your definition of concordance is a bit confusing, in this context it usually means similarity to known calls.

ADD REPLY
0
Entering edit mode

I want to call haplotypes and the variants. I mean I can get variants and haplotype in the same time.

ADD REPLY
2
Entering edit mode
13.2 years ago
Pablo ★ 1.9k

You might find this presentation useful (they use Ti/Tv for FDR)

http://www.broadinstitute.org/gsa/wiki/images/a/ac/Ngs_tutorial_depristo_1210.pdf

ADD COMMENT
0
Entering edit mode

That is really a good guide on NGS data processing, not only quailty control. Thanks a lot, Pablo.

We intend to sequence a subset of the genome by sanger means. Then, use that as quality control.

ADD REPLY

Login before adding your answer.

Traffic: 2790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6