Question: Meaning Of The Initial Column Of Filtered Out By Vcfutils.Pl Varfilter -P
0
gravatar for bioinformatica2005
17 months ago by
bioinformatica20050 wrote:

Hello all,

Which are the possible values and meanings of the first column in the output (STDERR) of the variants that are obtained when the option '-p' is used in:

vcfutils.pl varFilter -p [file_with_the_filtered_out_variants] ?

An example of this output (4 rejected variants, with field-1 values "P", "G", "g" and "a", respectively) is:

P chr01 4572 . G T 225 . DP=194;VDB=0.0000;AF1=0.5;AC1=1;DP4=80,82,25,7;MQ=60;FQ=225;PV4=0.0033,1,1,1.5e-05 GT:PL:GQ 0/1:255,0,255:99

G chr01 6691 . T A 13.2 . DP=43;VDB=0.0000;AF1=0.5;AC1=1;DP4=20,20,2,1;MQ=60;FQ=16.1;PV4=1,0.13,1,0.00065 GT:PL:GQ 0/1:43,0,255:46

g chr01 69870 . GTTTTTTTTTTTTT GTTTTTTTTTTTT,GTTTTTTTTTTT 66.5 . INDEL;DP=572;VDB=0.0367;AF1=1;AC1=2;DP4=0,7,14,165;MQ=60;FQ=-290;PV4=1,1,1,1 GT:PL:GQ 1/1:107,255,0,110,255,98:99

a chr01 115171 . C A 6.98 . DP=1;AF1=1;AC1=2;DP4=0,0,1,0;MQ=36;FQ=-30 GT:PL:GQ 1/1:36,3,0:4

By inspecting the source of vcfutils.pl it seems that the possible values are these letters: "UQdDaGgPMS". I suppose that each letter is a different reason for filtering out the variant but I do not know the meaning of each one.

Thank you in advance

Rafael

ADD COMMENTlink modified 17 months ago by Ashutosh Pandey4.7k • written 17 months ago by bioinformatica20050
1
gravatar for Ashutosh Pandey
17 months ago by
Ashutosh Pandey4.7k
Memphis
Ashutosh Pandey4.7k wrote:

Running help command -h for varFilter gives you the followign output.

Options: -Q INT    minimum RMS mapping quality for SNPs [10]
         -d INT    minimum read depth [2]
         -D INT    maximum read depth [10000000]
         -a INT    minimum number of alternate bases [2]
         -w INT    SNP within INT bp around a gap to be filtered [3]
         -W INT    window size for filtering adjacent gaps [10]
         -1 FLOAT  min P-value for strand bias (given PV4) [0.0001]
         -2 FLOAT  min P-value for baseQ bias [1e-100]
         -3 FLOAT  min P-value for mapQ bias [0]
         -4 FLOAT  min P-value for end distance bias [0.0001]
                 -e FLOAT  min P-value for HWE (plus F<0) [0.0001]
         -p        print filtered variants

I assume UQdDaGgPMS follows the same order but the short codes used to flag the variants for different filtering cases don't match with the short names in the usage other than a, D,d etc.

Answer:

P - Strand Bias (Using the information from usage order and checking the example you have posted for variant with P flag. Go to the link I have mentioned at the bottom to understand where you can find tha strand bias information) G - I assume G is same as -w (or -W) INT i.e. SNP within INT bp around a gap to be filtered [3] . Check the adjacent variants to confirm it. g is -W or -w. You can easily check it.

As you can see above 'a' is minimum number of alternate bases. Default if 2. For the variant you have mentioned above, the value is one and therefore it has been flagged.

To understand vcf format produced by samtools: http://samtools.sourceforge.net/samtools.shtml and go to VCF format section.

ADD COMMENTlink modified 17 months ago • written 17 months ago by Ashutosh Pandey4.7k

Good explanation although it is still not clear the meaning of the tags "U", "M" and "S" (if they really exist). "M" = "MAPQ bias"?. I had already queried the documentation of samtools you cite and made varFilter -h but I do not see the answer there. Thank you so much for your time, ashutoshmits.

ADD REPLYlink modified 17 months ago • written 17 months ago by bioinformatica20050
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 156 posts viewed in the last hour