Question: awk code for Exac MAF values
0
gravatar for basalganglia
3.6 years ago by
basalganglia30
England
basalganglia30 wrote:

I have a VCF as following and I filter exac values less and equal than 0.02 and also including unknown values "." . How can I write awk code for this ? I have written code as cat a.txt | awk '$6 <= "0.02"' | awk '$6 == "."' > but it does not work. Exac values are found in 6 column Could you please help me ?

Chr     Start   End     Ref     Alt     ExAC_ALL        ExAC_AFR        ExAC_AMR        ExAC_EAS        ExAC_FIN        ExAC_NFE        ExAC_OTH        ExAC_SAS        Otherinfo
1       12783   12783   G       A       .       .       .       .       .       .       .       .       0.5     881.62  27      1       12783   .       G       A       881.62  .       ABHet=0.279;ABHom=0.689;AC=33;AF=0.786;AN=42;BaseQRankSum=2.245;DP=1005;Dels=0.00;FS=0.000;HaplotypeScore=0.1330;InbreedingCoeff=0.0782;MLEAC=33;MLEAF=0.786;MQ=5.42;MQ0=949;MQRankSum=-0.409;OND=0.293;QD=1.77;ReadPosRankSum=-0.211;ANN=A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000438504|unprocessed_pseudogene||n.*1783C>T|||||1580|,A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000541675|unprocessed_pseudogene||n.*1416C>T|||||1580|,A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000423562|unprocessed_pseudogene||n.*1669C>T|||||1580|,A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000488147|unprocessed_pseudogene||n.*1351C>T|||||1621|,A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000538476|unprocessed_pseudogene||n.*1583C>T|||||1628|,A|intron_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328|processed_transcript|2/2|n.468+62G>A||||||,A|intron_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000515242|transcribed_unprocessed_pseudogene|2/2|n.465+62G>A||||||,A|intron_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000518655|transcribed_unprocessed_pseudogene|2/3|n.481+62G>A||||||,A|intron_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000450305|transcribed_unprocessed_pseudogene|3/5|n.182+86G>A||||||   GT:AD:DP:GQ:PL  0/1:3,25:27:15:102,0,15
awk filter • 1.1k views
ADD COMMENTlink modified 3.6 years ago by Jorge Amigo11k • written 3.6 years ago by basalganglia30
1
gravatar for RamRS
3.6 years ago by
RamRS19k
Houston, TX
RamRS19k wrote:

Quotes are not needed around numbers. And there are quite a few errors in your syntax. Try this:

cat a.txt |  awk '{if($6 <=0.02 || $6 == ".")  print }'

In case that doesn't work, try using " " as a delimiter by passing the stream though a tr -s " " first.

ADD COMMENTlink written 3.6 years ago by RamRS19k

Yep it works but file also includes number like 0,04561 is seen as 4,56E-02. 

I don' understand ""try using " " as a delimiter by passing the stream though a tr -s " " first."" sentence. Could you please explain this *

thanks !!!

BG

ADD REPLYlink written 3.6 years ago by basalganglia30
1
gravatar for Jorge Amigo
3.6 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

a simple awk '$6<0.02' a.txt would do, since the "." would be included in that filter.

ADD COMMENTlink written 3.6 years ago by Jorge Amigo11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1266 users visited in the last hour