Question: VCF filtering with a maximum coverage threshold
1
gravatar for rwn
5.3 years ago by
rwn490
United Kingdom
rwn490 wrote:

Hello all,

I am working on finding SNPs in a small number of highly similar Pseudomonas genomes. I've used freebayes to call variants with something like:

freebayes -f myREF.fasta --ploidy 1 --standard-filters -F 0.95 -C 5 myBams.sorted.bam > freebayes.vcf

I've already used some filters as above but now I'd like to filter further, using the vcffilter program. My question relates to what might be a "sensible" set of filtering criteria, with particular reference to setting a maximum coverage cut-off (ie. something along the lines of "DP < 250" or something). I'm worried about including SNPs from regions of the genome with super-high coverage, like insertion sequences and other TE's/repeated regions (or at least I'd like to see what the effect of filtering out these regions is).

I realise it's a bit of a how-long-is-a-piece-of-string type question, but was just wondering what people's thoughts were...

Cheers!

ADD COMMENTlink modified 5.3 years ago by brentp23k • written 5.3 years ago by rwn490
1
gravatar for brentp
5.3 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

You might start with Heng Li's paper: http://bioinformatics.oxfordjournals.org/content/early/2014/07/03/bioinformatics.btu356.full

and the associated script(s): https://github.com/lh3/varcmp/blob/master/scripts/vcf-extra-flt.js

(hopefully someone will implement and distribute a python/c/perl-based version of those filters)

ADD COMMENTlink written 5.3 years ago by brentp23k

Thanks for the link to the paper brentp :)

ADD REPLYlink written 5.3 years ago by rwn490
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2436 users visited in the last hour