Question: Filter vaiants having no AF information
0
gravatar for waqaskhokhar999
20 months ago by
waqaskhokhar999100 wrote:

I have > 1000 samples and I want to filter out variants based on minor allele frequency, My input dataset is a vcf file in this format:

CHROM   POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  88  108 139 159 265 350

1   55  .   C   T   40  PASS    DP=6720;EFF=intergenic_region(MODIFIER||||||||||1)  GT:GQ:DP    ./.:.:. 0|0:36:4    0|0:32:9    0|0:30:4    ./.:.:. ./.:.:.

1   56  .   T   A   40  PASS    DP=6785;EFF=intergenic_region(MODIFIER||||||||||1)  GT:GQ:DP    ./.:.:. ./.:.:. 0|0:32:9    0|0:30:4    ./.:.:. ./.:.:.

1   63  .   T   C   40  PASS    DP=7053;EFF=intergenic_region(MODIFIER||||||||||1)  GT:GQ:DP    ./.:.:. 0|0:40:5    0|0:32:9    0|0:38:5    ./.:.:. ./.:.:.

1   73  .   C   A   40  PASS    DP=8169;EFF=intergenic_region(MODIFIER||||||||||1)  GT:GQ:DP    ./.:.:. 0|0:40:5    0|0:40:9    0|0:38:6    ./.:.:. ./.:.:.

How can I keep snps with minor allele frequecny >= 0.05

ADD COMMENTlink modified 20 months ago • written 20 months ago by waqaskhokhar999100

see Updating allele frequency (AF) and minor allele frequency (MAF) INFO fields in .vcf

ADD REPLYlink written 20 months ago by Pierre Lindenbaum133k

I am trying to compile vcffilterjdk but getting this error:

Task :vcffilterjdk FAILED Downloading http://central.maven.org/maven2/com/github/samtools/htsjdk/2.19.0/htsjdk-2.19.0.jar to /home/waqas/jvarkit/lib/com/github/samtools/htsjdk/2.19.0/htsjdk-2.19.0.jar

FAILURE: Build failed with an exception.

BUILD FAILED in 0s 1 actionable task: 1 executed

ADD REPLYlink written 20 months ago by waqaskhokhar999100

as you can see "http://central.maven.org/maven2/com/github/samtools/htsjdk/2.19.0/htsjdk-2.19.0.jar" exists.

are you running behind a proxy ? try to fix it: https://stackoverflow.com/questions/5991194

ADD REPLYlink written 20 months ago by Pierre Lindenbaum133k

I have fixed the proxy settings and VcfFilterJdk took around 3 hours to complete the process, but it just updated the header, it didn't updated the INFO field, do I need to update the AN/AC fields first and then have to apply this script, as you can see from INFO (DP=6720;EFF=intergenic_region(MODIFIER||||||||||1)) filed that i don't have AN/AC tags in it? The currnet output of VcfFilterJdk is:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=q25,Description="Quality below 25">
##FILTER=<ID=q30,Description="Quality below 30">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=FT,Number=.,Type=String,Description="Genotype-level filter">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=EFF,Number=.,Type=String,Description="Predicted effects for this variant.Format: 'Effect ( Effect_Impact | Functional_Class | Codon_Change | Amino_Acid_Change| Amino_Acid_length | Gene_Name | Transcript_BioType | Gene_Coding | Transcript_ID | Exon_Rank  | Genotype_Number [ | ERRORS | WARNINGS ] )'">
##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
##INFO=<ID=MAF,Number=1,Type=Float,Description="Min Allele Frequency">
##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
##vcffilterjdk.meta=compilation:20190507213457 githash:ca6efffb htsjdk:2.19.0 date:20190507225208 cmd:-e VariantContextBuilder vcb = new VariantContextBuilder(variant); float ac = variant.getAttributeAsInt( AN ,0); if(ac>0) { List<Float> af = variant.getAttributeAsIntList( AC ,0).stream().map(N->N/ac).collect(Collectors.toList());vcb.attribute( AF ,af);vcb.attribute( MAF ,af.stream().mapToDouble(X->X.floatValue()).min().orElse(-1.0) );} return vcb.make();
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  88  108 139
1   55  .   C   T   40  PASS    DP=6720;EFF=intergenic_region(MODIFIER||||||||||1)  GT:DP:GQ    ./. 0|0:4:36    0|0:9:32
1   56  .   T   A   40  PASS    DP=6785;EFF=intergenic_region(MODIFIER||||||||||1)  GT:DP:GQ    ./. ./. 0|0:9:32
1   63  .   T   C   40  PASS    DP=7053;EFF=intergenic_region(MODIFIER||||||||||1)  GT:DP:GQ    ./. 0|0:5:40    0|0:9:32
1   73  .   C   A   40  PASS    DP=8169;EFF=intergenic_region(MODIFIER||||||||||1)  GT:DP:GQ    ./. 0|0:5:40    0|0:9:40
ADD REPLYlink modified 20 months ago by finswimmer14k • written 20 months ago by waqaskhokhar999100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2111 users visited in the last hour
_