Question: What are good settings for filtering VCF files?
6
gravatar for devenvyas
4.8 years ago by
devenvyas580
Stony Brook
devenvyas580 wrote:

I followed the samtools/bcfutils/vcfutils pathway followed here http://ged.msu.edu/angus/tutorials-2012/snp_tutorial.html to convert a set of human Hg19-aligned BAM files into a set of raw VCF files. I then got vcftools to filter down to just autosomal SNPs. These are really, really, really low-coverage genomes (they were enriched for NRY and/or mtDNA, and I am just trying to make use of the "leftovers")

Now I have the data I want, but I am trying to found out what of it is actually useable. I was wondering what are good filtering parameters for tossing/keeping human SNPs (or where can I find said parameters)? Thanks!

-Deven

 

snp vcftools bcftools samtools • 11k views
ADD COMMENTlink modified 4.8 years ago by Ashutosh Pandey11k • written 4.8 years ago by devenvyas580
17
gravatar for Ashutosh Pandey
4.8 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

This is what I use. I generally change them depending on the study. But more or less this is close to what everyone uses. 

MinDP (Minimum read depth):   5 (Indels) and 3 (SNPs)
MaxDP (Maximum read depth):  You have a low coverage data, so I would set it to 100. Normally it is 3 times the average coverage.
BaseQualBias (Minimum p-value for baseQ bias):  0
MinMQ (Minimum RMS mapping quality for SNPs):  20 or 30 (to be more stringent)
Qual (Minimum value of QUAL field):  15 or 20

StrandBias (Minimum p-value for strand bias):  0.0001
EndDistBias (Minimum p-value for end distance bias):  0.0001
MapQualBias (Minimum p-value for mapQ bias):  0
VBD (Minimum Variant Distance Bias):  0 (More relevant to RNA-seq reads)

GapWin (Window size for filtering adjacent gaps):  30 bp
SnpGap (SNP within INT bp around a gap to be filtered):   20 bp

SNPcluster (number of snps within a region): I usually drop all the snps if there are more than 3 snps within 10 bp. 

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Ashutosh Pandey11k
2

@Ashutosh -

could you provide reasoning as to why those are the thresholds you typically use? It would be helpful so researchers can understand the parameters better! Thanks!

ADD REPLYlink written 3.1 years ago by Sheila300
1

You are right to question this - indeed, there are absolutely no standards for these filtering criteria. Take a look at my take on DP alone: A: DP in VCF files?

ADD REPLYlink modified 9 months ago • written 9 months ago by Kevin Blighe44k

I know vcftools can filter based on DP/Qual, do you have any recommendations on what to use to do the other filtering? Thanks!

ADD REPLYlink written 4.8 years ago by devenvyas580
1

This one does almost everything that's mentioned above.
 

ADD REPLYlink written 4.8 years ago by poisonAlien2.8k

I have my own python script. If you know python you can modify it for your use. (https://github.com/ashutoshkpandey/Variants_call/blob/master/Filter_samtools_vcf.py)  OR you can use vcf-tools  "annotate" feature.  I think the second option will be much better. 

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1282 users visited in the last hour