Question: Meaning Of The Initial Column Of Filtered Out By Vcfutils.Pl Varfilter -P
gravatar for bioinformatica2005
8.3 years ago by
bioinformatica20050 wrote:

Hello all,

Which are the possible values and meanings of the first column in the output (STDERR) of the variants that are obtained when the option '-p' is used in: varFilter -p [file_with_the_filtered_out_variants] ?

An example of this output (4 rejected variants, with field-1 values "P", "G", "g" and "a", respectively) is:

P chr01 4572 . G T 225 . DP=194;VDB=0.0000;AF1=0.5;AC1=1;DP4=80,82,25,7;MQ=60;FQ=225;PV4=0.0033,1,1,1.5e-05 GT:PL:GQ 0/1:255,0,255:99

G chr01 6691 . T A 13.2 . DP=43;VDB=0.0000;AF1=0.5;AC1=1;DP4=20,20,2,1;MQ=60;FQ=16.1;PV4=1,0.13,1,0.00065 GT:PL:GQ 0/1:43,0,255:46

g chr01 69870 . GTTTTTTTTTTTTT GTTTTTTTTTTTT,GTTTTTTTTTTT 66.5 . INDEL;DP=572;VDB=0.0367;AF1=1;AC1=2;DP4=0,7,14,165;MQ=60;FQ=-290;PV4=1,1,1,1 GT:PL:GQ 1/1:107,255,0,110,255,98:99

a chr01 115171 . C A 6.98 . DP=1;AF1=1;AC1=2;DP4=0,0,1,0;MQ=36;FQ=-30 GT:PL:GQ 1/1:36,3,0:4

By inspecting the source of it seems that the possible values are these letters: "UQdDaGgPMS". I suppose that each letter is a different reason for filtering out the variant but I do not know the meaning of each one.

Thank you in advance


samtools mpileup • 3.2k views
ADD COMMENTlink modified 4.4 years ago by Biostar ♦♦ 20 • written 8.3 years ago by bioinformatica20050
gravatar for Ashutosh Pandey
8.3 years ago by
Ashutosh Pandey12k wrote:

Running help command -h for varFilter gives you the followign output.

Options: -Q INT    minimum RMS mapping quality for SNPs [10]
         -d INT    minimum read depth [2]
         -D INT    maximum read depth [10000000]
         -a INT    minimum number of alternate bases [2]
         -w INT    SNP within INT bp around a gap to be filtered [3]
         -W INT    window size for filtering adjacent gaps [10]
         -1 FLOAT  min P-value for strand bias (given PV4) [0.0001]
         -2 FLOAT  min P-value for baseQ bias [1e-100]
         -3 FLOAT  min P-value for mapQ bias [0]
         -4 FLOAT  min P-value for end distance bias [0.0001]
                 -e FLOAT  min P-value for HWE (plus F<0) [0.0001]
         -p        print filtered variants

I assume UQdDaGgPMS follows the same order but the short codes used to flag the variants for different filtering cases don't match with the short names in the usage other than a, D,d etc.


P - Strand Bias (Using the information from usage order and checking the example you have posted for variant with P flag. Go to the link I have mentioned at the bottom to understand where you can find tha strand bias information) G - I assume G is same as -w (or -W) INT i.e. SNP within INT bp around a gap to be filtered [3] . Check the adjacent variants to confirm it. g is -W or -w. You can easily check it.

As you can see above 'a' is minimum number of alternate bases. Default if 2. For the variant you have mentioned above, the value is one and therefore it has been flagged.

To understand vcf format produced by samtools: and go to VCF format section.

ADD COMMENTlink modified 8.3 years ago • written 8.3 years ago by Ashutosh Pandey12k

Good explanation although it is still not clear the meaning of the tags "U", "M" and "S" (if they really exist). "M" = "MAPQ bias"?. I had already queried the documentation of samtools you cite and made varFilter -h but I do not see the answer there. Thank you so much for your time, ashutoshmits.

ADD REPLYlink modified 8.3 years ago • written 8.3 years ago by bioinformatica20050
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1157 users visited in the last hour