Question: How does samtools mpileup handle multiple .bam inputs?
gravatar for traviata
2.1 years ago by
traviata10 wrote:

With samtools mpileup you can use multiple .bam files as inputs. When samtools computes depth, are these files simply concatenated, or is there a special way samtools synthesizes the data from multiple .bam files?

The samtools github faq seemed to have something about this, but I wasn't exactly sure how to interpret what they were saying:

  1. Between single- and multi-sample variant calling, which is preferred?

By using multi-sample calling, we gain power on SNPs shared between samples, but lose power on singleton SNPs. Here is a way of thinking of this. Suppose we have 1% false positive rate (FPR) for variant calling from one sample. If we call SNPs from 100 samples separately and then combine the calls, the FPR would be around 10-20% (not 100% because more SNPs are found given 100 samples). To retain an acceptable FPR on singletons, we have to be more stringent on each sample and thus lose power. Combining single-sample calls naively would not increase power on shared SNPs. This is where multi-sample calling does better: by taking the advantage of correlation between samples, we are able to call a SNP if it appears in multiple samples, but too weak to call in each sample individually. Joint calling is particularly preferable if we have multiple low-coverage samples for which single-sample calling does not work well. It is also able to reveal some artifacts only detectable with many samples.

rna-seq samtools • 1.7k views
ADD COMMENTlink written 2.1 years ago by traviata10

A similar thread here for your interest: Samtools: merge and mpileup vs mpileup alone for variant-calling with multiple BAM

ADD REPLYlink written 18 months ago by Kevin Blighe48k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2020 users visited in the last hour