Question

Variation & Genotype Calling From Ngs Data - Per Sample Or Multi Sample?

6

Entering edit mode

12.7 years ago

Travis ★ 2.8k

Hi,

I am wondering whether people generally use per-sample or multi-sample approaches for variation and genotype calling with NGS reads.

I know that when coverage is low, the multi-sample approach helps improve calls but what if coverage is high i.e. 20-30x?

In these cases is single sample calling ok? Is there still an argument for multi-sample calling or is it diminished? Or does it matter at all?

Thanks in advance.

snp variation next-gen sequencing • 7.3k views

ADD COMMENT • link updated 12.7 years ago by lh3 33k • written 12.7 years ago by Travis ★ 2.8k

score 24 · Answer 1 · 2011-08-05

24

Entering edit mode

12.7 years ago

lh3 33k

It depends on your purpose. If you have lots of samples (e.g. >100) and want to call SNPs in these samples, multi-sample calling is preferred as it produces low false positive rate (FPR). Errors will be added up if you call each sample separately and then combine the calls. Even 1% FPR per sample will be greatly amplified given 100 samples. You may see lots of spurious singletons. On the other hand, if you want to get the best consensus for each sample, you should call each sample separately. Joint calling has higher false negative rate for each sample. No single strategy suits all purposes.

ADD COMMENT • link 12.7 years ago by lh3 33k

8

Entering edit mode

Just to add a clarification, joint calling has biased false negative rate: it does better if a SNP is shared between samples but worse if it is a singleton. Sometimes, this is not an intended feature.

ADD REPLY • link 12.7 years ago by lh3 33k

0

Entering edit mode

Perfect answer thanks.

ADD REPLY • link 12.7 years ago by Travis ★ 2.8k

0

Entering edit mode

straight and very useful answer

ADD REPLY • link 12.7 years ago by Jorge Amigo 14k