Question: Will Samtools Mpileup Produce Different Variant Calls On A Subset Of The Same Data?
gravatar for mark.dunning
5.8 years ago by
The University of Sheffield
mark.dunning200 wrote:

Hi al,

I'm looking to parallelise my variant-calling using mpileup on a per-chromosome basis as to run in serial for a whole genome takes many, many days.

However, I've noticed that I get different variants called in the two separate analyses. Which is a bit disconcerting!

As a test, I ran the following two commands on chromosome Y for a pair of sample;

samtools mpileup -r chrY:50000000-60000000 -u -q 0 -f homo_sapiens.fa tumour.bam normal.bam | bcftools/bcftools view -bcvg - > chrY.sub.bcf
bcftools/bcftools view chrY.sub.bcf > chrY.sub.vcf

samtools mpileup -r chrY -u -q 0 -f homo_sapiens.fa tumour.bam normal.bam | bcftools view -bcg - > norecal.chrY.full.allPos.bcf
bcftools view chrY.full.allPos.bcf > chrY.full.allPos.vcf

There are 25 variants in the subset that are not called in the full-chromosome analysis. The calls look to be poor quality, so I guess I'm not too worried about them being lost in the analysis of the whole chromosome. But I still would like to get to the bottom of the differences.

They all seem to be quite high-depth, so I'm wondering if mpileup is doing down-sampling in the variant calling, which leads to different reads being used in the analysis each time? Would be BAQ calculation also depend on the total number of reads that you had.

Here I show output for two discrepant calls.

Variant call in the subset

chrY    58828570    .    C    A    4.15    .    DP=1626;VDB=0.0394;AF1=1;AC1=4;DP4=149,125,620,594;MQ=0;FQ=-52.1;PV4=0.35,0,1,1    GT:PL:GQ    1/1:15,255,0:19    1/1:23,255,0:27

Output in the full-chromosome example

chrY    58828570    .    C    .    3.52    .    DP=1626;VDB=0.0394;AF1=1;AC1=4;DP4=149,125,620,594;MQ=0;FQ=-52;PV4=0.35,0,1,1    PL    19    16

Variant call in the subset

chrY    58839518    .    G    T    3.56    .    DP=886;VDB=0.0020;AF1=1;AC1=4;DP4=90,97,289,343;MQ=0;FQ=-52;PV4=0.62,1.8e-287,1,0.0042    GT:PL:GQ    1/1:22,238,0:24    1/1:15,202,0:17

Output in the full-chromosome example

chrY    58839518    .    G    .    5.41    .    DP=886;VDB=0.0020;AF1=1;AC1=4;DP4=90,97,289,343;MQ=0;FQ=-50.7;PV4=0.62,1.8e-287,1,0.0042    PL    17    15
samtools mpileup • 2.5k views
ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by mark.dunning200

can you post the actual commands you used? in the first command, you're sending a .bcf file in after the -f flag. that should be a fasta.

ADD REPLYlink written 5.8 years ago by brentp22k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 769 users visited in the last hour