How does samtools mpileup handle multiple .bam inputs?
0
0
Entering edit mode
6.8 years ago
traviata ▴ 20

With samtools mpileup you can use multiple .bam files as inputs. When samtools computes depth, are these files simply concatenated, or is there a special way samtools synthesizes the data from multiple .bam files?

The samtools github faq seemed to have something about this, but I wasn't exactly sure how to interpret what they were saying:

  1. Between single- and multi-sample variant calling, which is preferred?

By using multi-sample calling, we gain power on SNPs shared between samples, but lose power on singleton SNPs. Here is a way of thinking of this. Suppose we have 1% false positive rate (FPR) for variant calling from one sample. If we call SNPs from 100 samples separately and then combine the calls, the FPR would be around 10-20% (not 100% because more SNPs are found given 100 samples). To retain an acceptable FPR on singletons, we have to be more stringent on each sample and thus lose power. Combining single-sample calls naively would not increase power on shared SNPs. This is where multi-sample calling does better: by taking the advantage of correlation between samples, we are able to call a SNP if it appears in multiple samples, but too weak to call in each sample individually. Joint calling is particularly preferable if we have multiple low-coverage samples for which single-sample calling does not work well. It is also able to reveal some artifacts only detectable with many samples.

RNA-Seq samtools • 5.2k views
ADD COMMENT
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1549 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6