Question: Variation & Genotype Calling From Ngs Data - Per Sample Or Multi Sample?
gravatar for Travis
8.4 years ago by
Travis2.8k wrote:


I am wondering whether people generally use per-sample or multi-sample approaches for variation and genotype calling with NGS reads.

I know that when coverage is low, the multi-sample approach helps improve calls but what if coverage is high i.e. 20-30x?

In these cases is single sample calling ok? Is there still an argument for multi-sample calling or is it diminished? Or does it matter at all?

Thanks in advance.

variation next-gen snp sequencing • 4.7k views
ADD COMMENTlink written 8.4 years ago by Travis2.8k
gravatar for lh3
8.4 years ago by
United States
lh331k wrote:

It depends on your purpose. If you have lots of samples (e.g. >100) and want to call SNPs in these samples, multi-sample calling is preferred as it produces low false positive rate (FPR). Errors will be added up if you call each sample separately and then combine the calls. Even 1% FPR per sample will be greatly amplified given 100 samples. You may see lots of spurious singletons. On the other hand, if you want to get the best consensus for each sample, you should call each sample separately. Joint calling has higher false negative rate for each sample. No single strategy suits all purposes.

ADD COMMENTlink written 8.4 years ago by lh331k

Just to add a clarification, joint calling has biased false negative rate: it does better if a SNP is shared between samples but worse if it is a singleton. Sometimes, this is not an intended feature.

ADD REPLYlink written 8.4 years ago by lh331k

Perfect answer thanks.

ADD REPLYlink written 8.4 years ago by Travis2.8k

straight and very useful answer

ADD REPLYlink written 8.4 years ago by Jorge Amigo11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1107 users visited in the last hour