Question: Interpreting Results From Mpileup
gravatar for Dataminer
7.0 years ago by
Dataminer2.7k wrote:


I am stuck with concluding a variant call I made on two BAM files (two samples).

Say sample1 and sample 2 for a specific region.

The command I used:

samtools mpileup -uf hg19.fa sample1.bam sample2.bam -r Chromosomal_Region | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | varFilter -D100 > var.flt.vcf

I get a result which looks like this:

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    Sample1    Sampl2
chrxyz    74311283    0    A    G    4.61    0    DP=2;VDB=0.0465;AF1=1;AC1=4;DP4=0,0,0,2;MQ=37;FQ=-28.7    GT:PL:DP:GQ    0/1:0,0,0:0:3    1/1:34,6,0:2:4
chrxyz    74311467    0    G    A    70.2    0    DP=3;VDB=0.0442;AF1=1;AC1=4;DP4=0,0,3,0;MQ=37;FQ=-32.3    GT:PL:DP:GQ    1/1:67,6,0:2:12    1/1:37,3,0:1:9

please ignore the value in chromosome column. In filter column it gave me 0, I don't know if it can be trusted or trashed? My gut feeling and my limited knowledge in SNP calling suggests me to take the second SNP and follow it up with the variant_effect_predictor from ensembl.

Any help in describing the results mentioned here will be appreciated and also suggestion for further analysis are also welcome (like insilico analysing these variants).

Thank you

variant-calling samtools • 4.2k views
ADD COMMENTlink modified 7.0 years ago by Jordan1.2k • written 7.0 years ago by Dataminer2.7k

Given the sequencing depth (max 2 in a sample), I'd hesitate following up on either of those.

ADD REPLYlink written 7.0 years ago by Devon Ryan98k

Could you please elaborate the terms here, as to how you knew about the sequencing depths in the regions here. Was it from DP? and which terms indicate the mapping quality and what are critical terms here in the result. Kindly share your knowledge Thank you

ADD REPLYlink written 7.0 years ago by Dataminer2.7k

Sure (BTW, read ashutoshmits answer, which is quite good!), though these are normally defined in the header portion of the VCF file (maybe that's not printed with procedure, I usually used GATK). DP is the depth, with the value in the INFO column is the sum of the depths for each of the samples (see the DP part for each of them, where parameters are ":" separated). A fuller description of the VCF fields from samtools is available on the samtools website (scroll down to "Understanding the output: the VCF/BCF format).

ADD REPLYlink written 7.0 years ago by Devon Ryan98k
gravatar for Ashutosh Pandey
7.0 years ago by
Ashutosh Pandey12k wrote:

Samtools mpileup and bcftools only call for SNPs and Indels and don't do any filtering on the variants. The "FILTER" column is part of the VCF file and various filtering tools such as snpEff, GATK VariantFilteration and VCF tools will use the "FILTER" column as a flag to tell if a particular variant passed filtering criteria provided by the user. Normally for SNPs, you at least require 5 reads with unique start positions that confirm alternate allele. Some people use 3 reads. Other criteria like mapping quality, Strand Bias etc. are used too. Read VCF tools. In my opinion, none of the SNPs above are high quality calls. The mapping qualities are good but number of supporting reads are too less. Also, supporting reads were only aligned to one strand.

ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Ashutosh Pandey12k
gravatar for swbarnes2
7.0 years ago by
United States
swbarnes29.4k wrote:

The low score in the QUAL column (and the QC entry) is a red flag, as is the low depth. Total depth can be seen in the DP value, the depth when low quality reads are filtered out is in the DP4 info; that's all explained in the vcf standards, which, if they aren't explained in the headers of your vcf, can be found by googling.

Based on that data lone, you probably should not conclude anything at all about what the sequence is there.

ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by swbarnes29.4k
gravatar for Jordan
7.0 years ago by
Jordan1.2k wrote:

I think you should further proceed using annovar. It's an annotation tool and also helps in filtering based on quality, depth etc. The two snp's given by you are of poor quality. But you might have more snp's which may be of good quality.

ADD COMMENTlink written 7.0 years ago by Jordan1.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1979 users visited in the last hour