I followed the samtools/bcfutils/vcfutils pathway followed here http://ged.msu.edu/angus/tutorials-2012/snp_tutorial.html to convert a set of human Hg19-aligned BAM files into a set of raw VCF files. I then got vcftools to filter down to just autosomal SNPs. These are really, really, really low-coverage genomes (they were enriched for NRY and/or mtDNA, and I am just trying to make use of the "leftovers")
Now I have the data I want, but I am trying to found out what of it is actually useable. I was wondering what are good filtering parameters for tossing/keeping human SNPs (or where can I find said parameters)? Thanks!
This is what I use. I generally change them depending on the study. But more or less this is close to what everyone uses.
MinDP (Minimum read depth): 5 (Indels) and 3 (SNPs)
MaxDP (Maximum read depth): You have a low coverage data, so I would set it to 100. Normally it is 3 times the average coverage.
BaseQualBias (Minimum p-value for baseQ bias): 0
MinMQ (Minimum RMS mapping quality for SNPs): 20 or 30 (to be more stringent)
Qual (Minimum value of QUAL field): 15 or 20
StrandBias (Minimum p-value for strand bias): 0.0001
EndDistBias (Minimum p-value for end distance bias): 0.0001
MapQualBias (Minimum p-value for mapQ bias): 0
VBD (Minimum Variant Distance Bias): 0 (More relevant to RNA-seq reads)
GapWin (Window size for filtering adjacent gaps): 30 bp
SnpGap (SNP within INT bp around a gap to be filtered): 20 bp
SNPcluster (number of snps within a region): I usually drop all the snps if there are more than 3 snps within 10 bp.