Maximum Depth Filter
1
0
Entering edit mode
3.9 years ago
ATpoint 50k

I would like to filter VCF files with a maximum depth filter, so removing variants with a coverage more than a threshold (in this case d+3*sqrt(d), d=mean coverage of the file), as suggested in Heng Li's paper from 2014. I was thinking about a custom script, including something like sambamba/samtools depth, but maybe there is an implementation somewhere that I missed which can do this task much more efficiently? Any ideas?

VCF Filter Maximum Depth • 1.6k views
0
Entering edit mode
3.9 years ago

assuming all variants have a DP field get mean DP using http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

java -jar dist/bioalcidaejdk.jar -e 'double dp=0.0;int n=0;while(iter.hasNext()) {dp+=iter.next().getAttributeAsInt("DP",0);n++;} println(dp/n); ' input.vcf
188.22544283413848


and then filter using vcfilterjdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html

 java -jar dist/vcffilterjdk.jar -e 'final double dp= 188.22544283413848;final double treshold = dp + 3 * Math.sqrt(dp);  return variant.getAttributeAsInt("DP",0) > treshold;'input.vcf

0
Entering edit mode

Thanks! What about mean depth of the entire WGS BAM instead of the mean of the DPs? Any ideas about that?

0
Entering edit mode

use whatever you want.

0
Entering edit mode

That would've been my next step ^^ Probably fetching a subsample of the WGS file, maybe all promoters, and then running any depth tool on it should be sufficient to get a solid estimate ov average depth.