How does samtools mpileup handle multiple .bam inputs?
0
0
Entering edit mode
3.7 years ago
traviata ▴ 10

With samtools mpileup you can use multiple .bam files as inputs. When samtools computes depth, are these files simply concatenated, or is there a special way samtools synthesizes the data from multiple .bam files?

The samtools github faq seemed to have something about this, but I wasn't exactly sure how to interpret what they were saying:

1. Between single- and multi-sample variant calling, which is preferred?

By using multi-sample calling, we gain power on SNPs shared between samples, but lose power on singleton SNPs. Here is a way of thinking of this. Suppose we have 1% false positive rate (FPR) for variant calling from one sample. If we call SNPs from 100 samples separately and then combine the calls, the FPR would be around 10-20% (not 100% because more SNPs are found given 100 samples). To retain an acceptable FPR on singletons, we have to be more stringent on each sample and thus lose power. Combining single-sample calls naively would not increase power on shared SNPs. This is where multi-sample calling does better: by taking the advantage of correlation between samples, we are able to call a SNP if it appears in multiple samples, but too weak to call in each sample individually. Joint calling is particularly preferable if we have multiple low-coverage samples for which single-sample calling does not work well. It is also able to reveal some artifacts only detectable with many samples.

RNA-Seq samtools • 3.2k views
0
Entering edit mode