Question

Snp Quality In Vcf File

8

Entering edit mode

14.3 years ago

Tiffani ▴ 150

I've been working on editing some code on the server at my work, I've come into a problem where we need a snp quality to go through our filters downstream in the pipeline that we use. Does anyone know if Snp Quality exists in the VCF file? If so where?

vcf mpileup snp dbsnp • 38k views

ADD COMMENT • link updated 3.1 years ago by LayneSadler ▴ 90 • written 14.3 years ago by Tiffani ▴ 150

4

Entering edit mode

14.3 years ago

Swbarnes2 ★ 1.6k

There's a second quality score in the file as well, usually. If your file has a column like this "GT:PL:GQ", then there's another column that looks something like this "1/1:255,255,0:99". That third score is also a quality score, it scales from 1-99.

The GQ is defined as "a phred quality -10log_10p(genotype call is wrong). So again, a high number means that the SNP is likely to be real.

Which of the two is better to use, that I don't know. The QUAL score behaves differently if you have multiple .bams in your vcf file, while each .bam file will have its own GT:PL:GQ at each putative polymorphic locus.

I've done just a bit of sanger confirmation SNPs called in an exome capture projects, and, for what it's worth, I found that most of the entries that were called a homozygous SNP in the vcf, even with poor GQ's, turned out to be real. But again, with exome capture, I was expecting a good number of SNPs, and I was counting SNPs that were off target, and therefore expected to be low coverage. On a sample where the whole sample is expected to be well covered, like a whole microbial genome, the low-quality SNPs might be less likely to be real.

I'd also consider looking at the DP4. Coverage in some cases is a good proxy for quality.

ADD COMMENT • link 14.3 years ago by Swbarnes2 ★ 1.6k

0

Entering edit mode

What was the lowest quality observed in vcf file (column QUAL) which turned out to be real SNP? Thanks a lot :-)

ADD REPLY • link 13.0 years ago by Biomonika (Noolean) 3.2k

3

Entering edit mode

14.3 years ago

Rm 8.3k

#CHROM
POS
ID
REF
ALT
QUAL
FILTER
INFO

Source: http://www.1000genomes.org/wiki/Analysis/vcf4.0

QUAL phred-scaled quality score for the assertion made in ALT. i.e. give -10log10 prob(call in ALT is wrong). If ALT is ”.” (no variant) then this is -10log10 p(variant), and if ALT is not ”.” this is -10log_10 p(no variant). High QUAL scores indicate high confidence calls. Although traditionally people use integer phred scores, this field is permitted to be a floating point to enable higher resolution for low confidence calls if desired. (Numeric)

ADD COMMENT • link 14.3 years ago by Rm 8.3k

0

Entering edit mode

Is that the Snp Quality though? because back in the days of pileup it had a quality map quality and snp quality. Which one of these does the QUAL in the VCF file represent?

ADD REPLY • link 14.3 years ago by Tiffani ▴ 150

score 15 · Accepted Answer · 2011-07-06

SNP Quality can be represented in several places in a VCF file.

1) the QUAL column, which is the phred-scaled quality score for the assertion made in ALT. In other words, it's: 10log_10 prob(call in ALT is wrong).

2) GQ, encoded in the FORMAT column is genotype quality, encoded as a phred score: -10log_10p(genotype call is wrong).

3) Especially if you're looking at tumor/normal pairs, you may see that it's represented as VAQ (variant quality).

The header of the VCF should give a description of exactly which fields are present in your files and help you determine which ones contain the quality scores that you're looking for.

For more info, check out the VCF format description