variants filtering with maf>5% every 100kb
1
1
Entering edit mode
5.8 years ago
GK1610 ▴ 120

I have a genotype vcf file for which I want to do trimming such that I have one variant with MAF > 5% every 100kb.

Does anyone have suggestions to do this smartly using bcftools or vcftools ?

Thanks Kiran

SNP • 1.2k views
ADD COMMENT
3
Entering edit mode
5.8 years ago

using VcfFilterJdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html

private String prevContig=null;
private int prevPos=-1;
private final int distance= 100_000;

public Object apply(final VariantContext variant) {
    final double AF = variant.getAttributeAsDoubleList("AF",1.0).stream().mapToDouble(Double::doubleValue).min().orElse(1.0);
    if(AF>0.05) return false;
    if(!variant.getContig().equals(prevContig) || variant.getStart()> prevPos )
        {
        prevContig = variant.getContig();
        prevPos = variant.getEnd() + distance;
        return true;
        }
    return false;
}

usage:

$ wget -q -O - "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr14.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" | gunzip  -c | java -jar dist/vcffilterjdk.jar -f jeter.code --body 

(...)
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
14  19000017    rs375700886 C   T   100 PASS    AA=.|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=8633;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.001;VT=SNP
14  19100025    rs201348429 G   A   100 PASS    AA=G|||;AC=5;AF=0.000998403;AFR_AF=0;AMR_AF=0;AN=5008;DP=35531;EAS_AF=0.004;EUR_AF=0;NS=2504;SAS_AF=0.001;VT=SNP
14  19200040    rs543063896 G   T   100 PASS    AA=g|||;AC=2;AF=0.000399361;AFR_AF=0;AMR_AF=0;AN=5008;DP=18098;EAS_AF=0;EUR_AF=0.001;NS=2504;SAS_AF=0.001;VT=SNP
14  19300074    rs531199478 T   A   100 PASS    AA=-|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=16207;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.001;VT=SNP
14  19400095    rs560944058 A   G   100 PASS    AA=.|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=32464;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.001;VT=SNP
14  19500295    rs557566396 T   A   100 PASS    AA=T|||;AC=5;AF=0.000998403;AFR_AF=0;AMR_AF=0.0043;AN=5008;DP=65422;EAS_AF=0;EUR_AF=0.002;NS=2504;SAS_AF=0;VT=SNP
14  19600309    rs549731335 A   G   100 PASS    AA=A|||;AC=1;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=5008;DP=38613;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP
14  19700430    rs549302478 T   C   100 PASS    AA=T|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=34929;EAS_AF=0;EUR_AF=0.001;NS=2504;SAS_AF=0;VT=SNP
14  19800530    rs572589774 G   A   100 PASS    AA=.|||;AC=3;AF=0.000599042;AFR_AF=0.0008;AMR_AF=0;AN=5008;DP=26775;EAS_AF=0;EUR_AF=0.002;NS=2504;SAS_AF=0;VT=SNP
14  19900580    rs557052698 G   C   100 PASS    AA=G|||;AC=2;AF=0.000399361;AFR_AF=0.0015;AMR_AF=0;AN=5008;DP=37564;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP
14  20000898    rs532972399 G   GT  100 PASS    AA=?|T|TT|unsure;AC=3;AF=0.000599042;AFR_AF=0.0023;AMR_AF=0;AN=5008;DP=30057;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=INDEL
14  20100978    rs534676846 C   T   100 PASS    AA=c|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=12705;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP
14  20201045    rs577045445 T   C   100 PASS    AA=.|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=26164;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP
14  20301077    rs546456970 C   A   100 PASS    AA=.|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=30763;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP
14  20401244    rs542803776 C   A   100 PASS    AA=c|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=12769;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP
14  20501262    rs143468540 C   T   100 PASS    AA=C|||;AC=9;AF=0.00179712;AFR_AF=0.0068;AMR_AF=0;AN=5008;DP=20808;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP
14  20601283    rs536911573 G   T   100 PASS    AA=G|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=23461;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.001;VT=SNP
(...)
ADD COMMENT

Login before adding your answer.

Traffic: 2338 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6