Will Samtools Mpileup Produce Different Variant Calls On A Subset Of The Same Data?
Entering edit mode
9.3 years ago
mark.dunning ▴ 230

Hi al,

I'm looking to parallelise my variant-calling using mpileup on a per-chromosome basis as to run in serial for a whole genome takes many, many days.

However, I've noticed that I get different variants called in the two separate analyses. Which is a bit disconcerting!

As a test, I ran the following two commands on chromosome Y for a pair of sample;

samtools mpileup -r chrY:50000000-60000000 -u -q 0 -f homo_sapiens.fa tumour.bam normal.bam | bcftools/bcftools view -bcvg - > chrY.sub.bcf
bcftools/bcftools view chrY.sub.bcf > chrY.sub.vcf

samtools mpileup -r chrY -u -q 0 -f homo_sapiens.fa tumour.bam normal.bam | bcftools view -bcg - > norecal.chrY.full.allPos.bcf
bcftools view chrY.full.allPos.bcf > chrY.full.allPos.vcf

There are 25 variants in the subset that are not called in the full-chromosome analysis. The calls look to be poor quality, so I guess I'm not too worried about them being lost in the analysis of the whole chromosome. But I still would like to get to the bottom of the differences.

They all seem to be quite high-depth, so I'm wondering if mpileup is doing down-sampling in the variant calling, which leads to different reads being used in the analysis each time? Would be BAQ calculation also depend on the total number of reads that you had.

Here I show output for two discrepant calls.

Variant call in the subset

chrY    58828570    .    C    A    4.15    .    DP=1626;VDB=0.0394;AF1=1;AC1=4;DP4=149,125,620,594;MQ=0;FQ=-52.1;PV4=0.35,0,1,1    GT:PL:GQ    1/1:15,255,0:19    1/1:23,255,0:27

Output in the full-chromosome example

chrY    58828570    .    C    .    3.52    .    DP=1626;VDB=0.0394;AF1=1;AC1=4;DP4=149,125,620,594;MQ=0;FQ=-52;PV4=0.35,0,1,1    PL    19    16

Variant call in the subset

chrY    58839518    .    G    T    3.56    .    DP=886;VDB=0.0020;AF1=1;AC1=4;DP4=90,97,289,343;MQ=0;FQ=-52;PV4=0.62,1.8e-287,1,0.0042    GT:PL:GQ    1/1:22,238,0:24    1/1:15,202,0:17

Output in the full-chromosome example

chrY    58839518    .    G    .    5.41    .    DP=886;VDB=0.0020;AF1=1;AC1=4;DP4=90,97,289,343;MQ=0;FQ=-50.7;PV4=0.62,1.8e-287,1,0.0042    PL    17    15
samtools mpileup • 3.9k views
Entering edit mode

can you post the actual commands you used? in the first command, you're sending a .bcf file in after the -f flag. that should be a fasta.


Login before adding your answer.

Traffic: 889 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6