How do I chose variant quality filters for a VCF file?
0
1
Entering edit mode
2.6 years ago
James Reeve ▴ 110

Background:

I'm a Master's student doing bioinformatics for the first time. I have generated a .vcf file of SNPs from whole genome sequencing data using the GATK pipeline. However I'm using a non-model organism (Aulorhynchus flavidus) so I'm missing the true SNP dataset to do GATK's VQSR. My alternative is to hard filter the variants to remove lower quality SNPs, but I'm not sure how stringently to set my filters.

Questions:

  • What is your approach to choosing the thresholds for hard filtering?
  • How many SNPs should I have to minimize noise (i.e. errors) and maximize signal?
  • Can you point out any good resources on filtering variants?

I know this is an open ended question, and any answer is circumstantial. However without experience, I'm at a lose on how to start. Any advice or references would help a lot.

snp variant filter • 3.2k views
ADD COMMENT
0
Entering edit mode

Hello,

have a look at these two documents. For me it was a good starting point:

fin swimmer

ADD REPLY

Login before adding your answer.

Traffic: 1530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6