Meaning Of The Initial Column Of Filtered Out By Vcfutils.Pl Varfilter -P
1
0
Entering edit mode
11.5 years ago

Hello all,

Which are the possible values and meanings of the first column in the output (STDERR) of the variants that are obtained when the option '-p' is used in:

vcfutils.pl varFilter -p [file_with_the_filtered_out_variants] ?

An example of this output (4 rejected variants, with field-1 values "P", "G", "g" and "a", respectively) is:

P chr01 4572 . G T 225 . DP=194;VDB=0.0000;AF1=0.5;AC1=1;DP4=80,82,25,7;MQ=60;FQ=225;PV4=0.0033,1,1,1.5e-05 GT:PL:GQ 0/1:255,0,255:99

G chr01 6691 . T A 13.2 . DP=43;VDB=0.0000;AF1=0.5;AC1=1;DP4=20,20,2,1;MQ=60;FQ=16.1;PV4=1,0.13,1,0.00065 GT:PL:GQ 0/1:43,0,255:46

g chr01 69870 . GTTTTTTTTTTTTT GTTTTTTTTTTTT,GTTTTTTTTTTT 66.5 . INDEL;DP=572;VDB=0.0367;AF1=1;AC1=2;DP4=0,7,14,165;MQ=60;FQ=-290;PV4=1,1,1,1 GT:PL:GQ 1/1:107,255,0,110,255,98:99

a chr01 115171 . C A 6.98 . DP=1;AF1=1;AC1=2;DP4=0,0,1,0;MQ=36;FQ=-30 GT:PL:GQ 1/1:36,3,0:4

By inspecting the source of vcfutils.pl it seems that the possible values are these letters: "UQdDaGgPMS". I suppose that each letter is a different reason for filtering out the variant but I do not know the meaning of each one.

Thank you in advance

Rafael

samtools mpileup • 3.9k views
ADD COMMENT
1
Entering edit mode
11.5 years ago

Running help command -h for varFilter gives you the followign output.

Options: -Q INT    minimum RMS mapping quality for SNPs [10]
         -d INT    minimum read depth [2]
         -D INT    maximum read depth [10000000]
         -a INT    minimum number of alternate bases [2]
         -w INT    SNP within INT bp around a gap to be filtered [3]
         -W INT    window size for filtering adjacent gaps [10]
         -1 FLOAT  min P-value for strand bias (given PV4) [0.0001]
         -2 FLOAT  min P-value for baseQ bias [1e-100]
         -3 FLOAT  min P-value for mapQ bias [0]
         -4 FLOAT  min P-value for end distance bias [0.0001]
                 -e FLOAT  min P-value for HWE (plus F<0) [0.0001]
         -p        print filtered variants

I assume UQdDaGgPMS follows the same order but the short codes used to flag the variants for different filtering cases don't match with the short names in the usage other than a, D,d etc.

Answer:

P - Strand Bias (Using the information from usage order and checking the example you have posted for variant with P flag. Go to the link I have mentioned at the bottom to understand where you can find tha strand bias information) G - I assume G is same as -w (or -W) INT i.e. SNP within INT bp around a gap to be filtered [3] . Check the adjacent variants to confirm it. g is -W or -w. You can easily check it.

As you can see above 'a' is minimum number of alternate bases. Default if 2. For the variant you have mentioned above, the value is one and therefore it has been flagged.

To understand vcf format produced by samtools: http://samtools.sourceforge.net/samtools.shtml and go to VCF format section.

ADD COMMENT
0
Entering edit mode

Good explanation although it is still not clear the meaning of the tags "U", "M" and "S" (if they really exist). "M" = "MAPQ bias"?. I had already queried the documentation of samtools you cite and made varFilter -h but I do not see the answer there. Thank you so much for your time, ashutoshmits.

ADD REPLY

Login before adding your answer.

Traffic: 1044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6