Variation & Genotype Calling From Ngs Data - Per Sample Or Multi Sample?
1
6
Entering edit mode
12.7 years ago
Travis ★ 2.8k

Hi,

I am wondering whether people generally use per-sample or multi-sample approaches for variation and genotype calling with NGS reads.

I know that when coverage is low, the multi-sample approach helps improve calls but what if coverage is high i.e. 20-30x?

In these cases is single sample calling ok? Is there still an argument for multi-sample calling or is it diminished? Or does it matter at all?

Thanks in advance.

snp variation next-gen sequencing • 7.3k views
ADD COMMENT
24
Entering edit mode
12.7 years ago
lh3 33k

It depends on your purpose. If you have lots of samples (e.g. >100) and want to call SNPs in these samples, multi-sample calling is preferred as it produces low false positive rate (FPR). Errors will be added up if you call each sample separately and then combine the calls. Even 1% FPR per sample will be greatly amplified given 100 samples. You may see lots of spurious singletons. On the other hand, if you want to get the best consensus for each sample, you should call each sample separately. Joint calling has higher false negative rate for each sample. No single strategy suits all purposes.

ADD COMMENT
8
Entering edit mode

Just to add a clarification, joint calling has biased false negative rate: it does better if a SNP is shared between samples but worse if it is a singleton. Sometimes, this is not an intended feature.

ADD REPLY
0
Entering edit mode

Perfect answer thanks.

ADD REPLY
0
Entering edit mode

straight and very useful answer

ADD REPLY

Login before adding your answer.

Traffic: 2735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6