Question: variants filtering with maf>5% every 100kb
1
gravatar for GK1610
4 weeks ago by
GK161060
United States
GK161060 wrote:

I have a genotype vcf file for which I want to do trimming such that I have one variant with MAF > 5% every 100kb.

Does anyone have suggestions to do this smartly using bcftools or vcftools ?

Thanks Kiran

snp • 95 views
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by GK161060
3
gravatar for Pierre Lindenbaum
4 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

using VcfFilterJdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html

private String prevContig=null;
private int prevPos=-1;
private final int distance= 100_000;

public Object apply(final VariantContext variant) {
    final double AF = variant.getAttributeAsDoubleList("AF",1.0).stream().mapToDouble(Double::doubleValue).min().orElse(1.0);
    if(AF>0.05) return false;
    if(!variant.getContig().equals(prevContig) || variant.getStart()> prevPos )
        {
        prevContig = variant.getContig();
        prevPos = variant.getEnd() + distance;
        return true;
        }
    return false;
}

usage:

$ wget -q -O - "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr14.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" | gunzip  -c | java -jar dist/vcffilterjdk.jar -f jeter.code --body 

(...)
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
14  19000017    rs375700886 C   T   100 PASS    AA=.|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=8633;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.001;VT=SNP
14  19100025    rs201348429 G   A   100 PASS    AA=G|||;AC=5;AF=0.000998403;AFR_AF=0;AMR_AF=0;AN=5008;DP=35531;EAS_AF=0.004;EUR_AF=0;NS=2504;SAS_AF=0.001;VT=SNP
14  19200040    rs543063896 G   T   100 PASS    AA=g|||;AC=2;AF=0.000399361;AFR_AF=0;AMR_AF=0;AN=5008;DP=18098;EAS_AF=0;EUR_AF=0.001;NS=2504;SAS_AF=0.001;VT=SNP
14  19300074    rs531199478 T   A   100 PASS    AA=-|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=16207;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.001;VT=SNP
14  19400095    rs560944058 A   G   100 PASS    AA=.|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=32464;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.001;VT=SNP
14  19500295    rs557566396 T   A   100 PASS    AA=T|||;AC=5;AF=0.000998403;AFR_AF=0;AMR_AF=0.0043;AN=5008;DP=65422;EAS_AF=0;EUR_AF=0.002;NS=2504;SAS_AF=0;VT=SNP
14  19600309    rs549731335 A   G   100 PASS    AA=A|||;AC=1;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=5008;DP=38613;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP
14  19700430    rs549302478 T   C   100 PASS    AA=T|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=34929;EAS_AF=0;EUR_AF=0.001;NS=2504;SAS_AF=0;VT=SNP
14  19800530    rs572589774 G   A   100 PASS    AA=.|||;AC=3;AF=0.000599042;AFR_AF=0.0008;AMR_AF=0;AN=5008;DP=26775;EAS_AF=0;EUR_AF=0.002;NS=2504;SAS_AF=0;VT=SNP
14  19900580    rs557052698 G   C   100 PASS    AA=G|||;AC=2;AF=0.000399361;AFR_AF=0.0015;AMR_AF=0;AN=5008;DP=37564;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP
14  20000898    rs532972399 G   GT  100 PASS    AA=?|T|TT|unsure;AC=3;AF=0.000599042;AFR_AF=0.0023;AMR_AF=0;AN=5008;DP=30057;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=INDEL
14  20100978    rs534676846 C   T   100 PASS    AA=c|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=12705;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP
14  20201045    rs577045445 T   C   100 PASS    AA=.|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=26164;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP
14  20301077    rs546456970 C   A   100 PASS    AA=.|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=30763;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP
14  20401244    rs542803776 C   A   100 PASS    AA=c|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=12769;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP
14  20501262    rs143468540 C   T   100 PASS    AA=C|||;AC=9;AF=0.00179712;AFR_AF=0.0068;AMR_AF=0;AN=5008;DP=20808;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP
14  20601283    rs536911573 G   T   100 PASS    AA=G|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=23461;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.001;VT=SNP
(...)
ADD COMMENTlink written 4 weeks ago by Pierre Lindenbaum118k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1151 users visited in the last hour