Question: How to select SNPs the most conservative way after WGS Variant Calling?
gravatar for serpalma.v
4 months ago by
serpalma.v20 wrote:


I have made a raw (unfiltered) variant call set following GATK best practices (VCF file with ~16 Million SNPs produced by GenotypeGVCFs). The original WGS data corresponds to 60 samples sequenced at a average coverage of 20x.

We want to identify a small subset of really good SNPs and another subset of really bad SNPs, which we could use for validation.

How can I construct a filter that keeps SNPs most likely to be true and false positives, respectively?

A first choice would be to rank by QUAL and pick the SNPs at the top and the bottom of the list, but I am sure there is a more sofisticated way to do this.

Also, since the VCF contains multiple samples, would it be better to filter by site or by genotype?

Thanks and I appreciate your feedback!

sequencing snp next-gen • 199 views
ADD COMMENTlink modified 4 months ago by Istvan Albert ♦♦ 80k • written 4 months ago by serpalma.v20
gravatar for Istvan Albert
4 months ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

You could make use of depth and allele frequencies as well. The more samples you have to more difficult is to understand how was the QUAL field computed and what weight it assigns to the data.

In addition, you could run a second SNP caller and take the SNPs identified by both more "credible".

ADD COMMENTlink written 4 months ago by Istvan Albert ♦♦ 80k

Thanks Istvan

OK, I will call variants with SAM/BCFtools on the same BAMs as well. Then I can subset both raw call sets by depth and allele frequency. Then consider common intersecting SNPs as the good ones.

To filter by depth, I guess that I could only take the SNPs where all samples have a depth >= 30x as per this white paper.

To filter by allele frequency (provided that SNPs have the required depth), I was thinking to keep SNPs where all homozygous samples have an allele frequency of 1 or all heterozygous samples have an allele frequency of 0.5, as it has been stated in this review.

Did I get this right?

ADD REPLYlink modified 4 months ago • written 4 months ago by serpalma.v20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 936 users visited in the last hour