Question: Variation & Genotype Calling From Ngs Data - Per Sample Or Multi Sample?
I am wondering whether people generally use per-sample or multi-sample approaches for variation and genotype calling with NGS reads.

I know that when coverage is low, the multi-sample approach helps improve calls but what if coverage is high i.e. 20-30x?

In these cases is single sample calling ok? Is there still an argument for multi-sample calling or is it diminished? Or does it matter at all?

Thanks in advance.

It depends on your purpose. If you have lots of samples (e.g. >100) and want to call SNPs in these samples, multi-sample calling is preferred as it produces low false positive rate (FPR). Errors will be added up if you call each sample separately and then combine the calls. Even 1% FPR per sample will be greatly amplified given 100 samples. You may see lots of spurious singletons. On the other hand, if you want to get the best consensus for each sample, you should call each sample separately. Joint calling has higher false negative rate for each sample. No single strategy suits all purposes.

Just to add a clarification, joint calling has biased false negative rate: it does better if a SNP is shared between samples but worse if it is a singleton. Sometimes, this is not an intended feature.

Perfect answer thanks.

straight and very useful answer

