Interpreting Results From Mpileup
3
0
Entering edit mode
7.3 years ago
Dataminer ★ 2.7k

Hi!

I am stuck with concluding a variant call I made on two BAM files (two samples).

Say sample1 and sample 2 for a specific region.

The command I used:

samtools mpileup -uf hg19.fa sample1.bam sample2.bam -r Chromosomal_Region | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf


I get a result which looks like this:

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    Sample1    Sampl2
chrxyz    74311283    0    A    G    4.61    0    DP=2;VDB=0.0465;AF1=1;AC1=4;DP4=0,0,0,2;MQ=37;FQ=-28.7    GT:PL:DP:GQ    0/1:0,0,0:0:3    1/1:34,6,0:2:4
chrxyz    74311467    0    G    A    70.2    0    DP=3;VDB=0.0442;AF1=1;AC1=4;DP4=0,0,3,0;MQ=37;FQ=-32.3    GT:PL:DP:GQ    1/1:67,6,0:2:12    1/1:37,3,0:1:9


please ignore the value in chromosome column. In filter column it gave me 0, I don't know if it can be trusted or trashed? My gut feeling and my limited knowledge in SNP calling suggests me to take the second SNP and follow it up with the variant_effect_predictor from ensembl.

Any help in describing the results mentioned here will be appreciated and also suggestion for further analysis are also welcome (like insilico analysing these variants).

Thank you

samtools variant-calling • 4.3k views
0
Entering edit mode

Given the sequencing depth (max 2 in a sample), I'd hesitate following up on either of those.

0
Entering edit mode

Could you please elaborate the terms here, as to how you knew about the sequencing depths in the regions here. Was it from DP? and which terms indicate the mapping quality and what are critical terms here in the result. Kindly share your knowledge Thank you

0
Entering edit mode

Sure (BTW, read ashutoshmits answer, which is quite good!), though these are normally defined in the header portion of the VCF file (maybe that's not printed with procedure, I usually used GATK). DP is the depth, with the value in the INFO column is the sum of the depths for each of the samples (see the DP part for each of them, where parameters are ":" separated). A fuller description of the VCF fields from samtools is available on the samtools website (scroll down to "Understanding the output: the VCF/BCF format).

2
Entering edit mode
7.3 years ago

Samtools mpileup and bcftools only call for SNPs and Indels and don't do any filtering on the variants. The "FILTER" column is part of the VCF file and various filtering tools such as snpEff, GATK VariantFilteration and VCF tools will use the "FILTER" column as a flag to tell if a particular variant passed filtering criteria provided by the user. Normally for SNPs, you at least require 5 reads with unique start positions that confirm alternate allele. Some people use 3 reads. Other criteria like mapping quality, Strand Bias etc. are used too. Read VCF tools. In my opinion, none of the SNPs above are high quality calls. The mapping qualities are good but number of supporting reads are too less. Also, supporting reads were only aligned to one strand.

1
Entering edit mode
7.3 years ago
swbarnes2 9.7k

The low score in the QUAL column (and the QC entry) is a red flag, as is the low depth. Total depth can be seen in the DP value, the depth when low quality reads are filtered out is in the DP4 info; that's all explained in the vcf standards, which, if they aren't explained in the headers of your vcf, can be found by googling.

Based on that data lone, you probably should not conclude anything at all about what the sequence is there.

0
Entering edit mode
7.3 years ago
Jordan ★ 1.2k

I think you should further proceed using annovar. It's an annotation tool and also helps in filtering based on quality, depth etc. The two snp's given by you are of poor quality. But you might have more snp's which may be of good quality.