Question: VCF filtering with a maximum coverage threshold
gravatar for rwn
5.7 years ago by
United Kingdom
rwn510 wrote:

Hello all,

I am working on finding SNPs in a small number of highly similar Pseudomonas genomes. I've used freebayes to call variants with something like:

freebayes -f myREF.fasta --ploidy 1 --standard-filters -F 0.95 -C 5 myBams.sorted.bam > freebayes.vcf

I've already used some filters as above but now I'd like to filter further, using the vcffilter program. My question relates to what might be a "sensible" set of filtering criteria, with particular reference to setting a maximum coverage cut-off (ie. something along the lines of "DP < 250" or something). I'm worried about including SNPs from regions of the genome with super-high coverage, like insertion sequences and other TE's/repeated regions (or at least I'd like to see what the effect of filtering out these regions is).

I realise it's a bit of a how-long-is-a-piece-of-string type question, but was just wondering what people's thoughts were...


ADD COMMENTlink modified 5.7 years ago by brentp23k • written 5.7 years ago by rwn510
gravatar for brentp
5.7 years ago by
Salt Lake City, UT
brentp23k wrote:

You might start with Heng Li's paper:

and the associated script(s):

(hopefully someone will implement and distribute a python/c/perl-based version of those filters)

ADD COMMENTlink written 5.7 years ago by brentp23k

Thanks for the link to the paper brentp :)

ADD REPLYlink written 5.7 years ago by rwn510
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1096 users visited in the last hour