Snp Quality In Vcf File
3
8
Entering edit mode
10.4 years ago
Tiffani ▴ 150

I've been working on editing some code on the server at my work, I've come into a problem where we need a snp quality to go through our filters downstream in the pipeline that we use. Does anyone know if Snp Quality exists in the VCF file? If so where?

vcf mpileup snp dbsnp • 28k views
ADD COMMENT
12
Entering edit mode
10.4 years ago

SNP Quality can be represented in several places in a VCF file.

1) the QUAL column, which is the phred-scaled quality score for the assertion made in ALT. In other words, it's: 10log_10 prob(call in ALT is wrong).

2) GQ, encoded in the FORMAT column is genotype quality, encoded as a phred score: -10log_10p(genotype call is wrong).

3) Especially if you're looking at tumor/normal pairs, you may see that it's represented as VAQ (variant quality).

The header of the VCF should give a description of exactly which fields are present in your files and help you determine which ones contain the quality scores that you're looking for.

For more info, check out the VCF format description

ADD COMMENT
4
Entering edit mode
10.4 years ago
Swbarnes2 ★ 1.5k

There's a second quality score in the file as well, usually. If your file has a column like this "GT:PL:GQ", then there's another column that looks something like this "1/1:255,255,0:99". That third score is also a quality score, it scales from 1-99.

The GQ is defined as "a phred quality -10log_10p(genotype call is wrong). So again, a high number means that the SNP is likely to be real.

Which of the two is better to use, that I don't know. The QUAL score behaves differently if you have multiple .bams in your vcf file, while each .bam file will have its own GT:PL:GQ at each putative polymorphic locus.

I've done just a bit of sanger confirmation SNPs called in an exome capture projects, and, for what it's worth, I found that most of the entries that were called a homozygous SNP in the vcf, even with poor GQ's, turned out to be real. But again, with exome capture, I was expecting a good number of SNPs, and I was counting SNPs that were off target, and therefore expected to be low coverage. On a sample where the whole sample is expected to be well covered, like a whole microbial genome, the low-quality SNPs might be less likely to be real.

I'd also consider looking at the DP4. Coverage in some cases is a good proxy for quality.

ADD COMMENT
0
Entering edit mode

What was the lowest quality observed in vcf file (column QUAL) which turned out to be real SNP? Thanks a lot :-)

ADD REPLY
3
Entering edit mode
10.4 years ago
Rm 8.1k
  1. #CHROM
  2. POS
  3. ID
  4. REF
  5. ALT
  6. QUAL
  7. FILTER
  8. INFO

Source: http://www.1000genomes.org/wiki/Analysis/vcf4.0

QUAL phred-scaled quality score for the assertion made in ALT. i.e. give -10log10 prob(call in ALT is wrong). If ALT is ”.” (no variant) then this is -10log10 p(variant), and if ALT is not ”.” this is -10log_10 p(no variant). High QUAL scores indicate high confidence calls. Although traditionally people use integer phred scores, this field is permitted to be a floating point to enable higher resolution for low confidence calls if desired. (Numeric)

ADD COMMENT
0
Entering edit mode

Is that the Snp Quality though? because back in the days of pileup it had a quality map quality and snp quality. Which one of these does the QUAL in the VCF file represent?

ADD REPLY

Login before adding your answer.

Traffic: 2500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6