Question: Strategy To Make Cutoff For A Variant Calling Experiment Of Ngs
gravatar for Jianfeng Mao
9.8 years ago by
Jianfeng Mao0 wrote:

We got our individuals (F1s, from crossing between reference genome and objective one) of a plant species sequenced by NGS method. Variants (snps and indels) were called for each of objective plant by these F1 individuals NGS data. Our data are haplotype data, a phased haplotype was called for one objective plant (one parent of the F1 individual).

For quality control, we employed several values: (1) concordance, (ratio of reads supporting a predicted feature to total coverage); (2) coverage, (how many reads supported this variant); (3) base quality, (base quality from the sequencing process).

Here, concordance may be the most important variable for quality control. The best variant calls determined by concordance are those have values of 0.5. Obviously, smaller ones (<0.1) and bigger ones (>0.9) are not good. Coverage may also play an important role, like the calls which have 0.5 vale for concordance and 0.1 coverage may not be the good calls. While, base quality may be the most intuitive quality control variable. The bigger base quality should be the calls which are better.

Here, I want to find a good strategy to set a cutoff to our variant calls based on these three or just concordance and coverage variables. I prefer a more statistical way.

Would you please give me any ideas/directions on my problems? Thanks in advance.

quality next-gen sequencing • 2.7k views
ADD COMMENTlink modified 9.4 years ago by Pablo1.9k • written 9.8 years ago by Jianfeng Mao0

You want to call variants or haplotypes? Your definition of concordance is a bit confusing, in this context it usually means similarity to known calls.

ADD REPLYlink written 9.8 years ago by Casbon3.2k

I want to call haplotypes and the variants. I mean I can get variants and haplotype in the same time.

ADD REPLYlink written 9.8 years ago by Jianfeng Mao0
gravatar for Pablo
9.8 years ago by
Pablo1.9k wrote:

You might find this presentation useful (they use Ti/Tv for FDR)

ADD COMMENTlink written 9.8 years ago by Pablo1.9k

That is really a good guide on NGS data processing, not only quailty control. Thanks a lot, Pablo.

We intend to sequence a subset of the genome by sanger means. Then, use that as quality control.

ADD REPLYlink written 9.8 years ago by Jianfeng Mao0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1576 users visited in the last hour