Question: awk code for Exac MAF values
3.8 years ago by
basalganglia30 wrote:

I have a VCF as following and I filter exac values less and equal than 0.02 and also including unknown values "." . How can I write awk code for this ? I have written code as cat a.txt | awk '$6 <= "0.02"' | awk '$6 == "."' > but it does not work. Exac values are found in 6 column Could you please help me ?

Chr     Start   End     Ref     Alt     ExAC_ALL        ExAC_AFR        ExAC_AMR        ExAC_EAS        ExAC_FIN        ExAC_NFE        ExAC_OTH        ExAC_SAS        Otherinfo
1       12783   12783   G       A       .       .       .       .       .       .       .       .       0.5     881.62  27      1       12783   .       G       A       881.62  .       ABHet=0.279;ABHom=0.689;AC=33;AF=0.786;AN=42;BaseQRankSum=2.245;DP=1005;Dels=0.00;FS=0.000;HaplotypeScore=0.1330;InbreedingCoeff=0.0782;MLEAC=33;MLEAF=0.786;MQ=5.42;MQ0=949;MQRankSum=-0.409;OND=0.293;QD=1.77;ReadPosRankSum=-0.211;ANN=A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000438504|unprocessed_pseudogene||n.*1783C>T|||||1580|,A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000541675|unprocessed_pseudogene||n.*1416C>T|||||1580|,A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000423562|unprocessed_pseudogene||n.*1669C>T|||||1580|,A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000488147|unprocessed_pseudogene||n.*1351C>T|||||1621|,A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000538476|unprocessed_pseudogene||n.*1583C>T|||||1628|,A|intron_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328|processed_transcript|2/2|n.468+62G>A||||||,A|intron_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000515242|transcribed_unprocessed_pseudogene|2/2|n.465+62G>A||||||,A|intron_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000518655|transcribed_unprocessed_pseudogene|2/3|n.481+62G>A||||||,A|intron_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000450305|transcribed_unprocessed_pseudogene|3/5|n.182+86G>A||||||   GT:AD:DP:GQ:PL  0/1:3,25:27:15:102,0,15
ADD COMMENTlink modified 3.8 years ago by Jorge Amigo11k • written 3.8 years ago by basalganglia30
3.8 years ago by
Houston, TX
RamRS20k wrote:

Quotes are not needed around numbers. And there are quite a few errors in your syntax. Try this:

cat a.txt |  awk '{if($6 <=0.02 || $6 == ".")  print }'

In case that doesn't work, try using " " as a delimiter by passing the stream though a tr -s " " first.

ADD COMMENTlink written 3.8 years ago by RamRS20k

Yep it works but file also includes number like 0,04561 is seen as 4,56E-02. 

I don' understand ""try using " " as a delimiter by passing the stream though a tr -s " " first."" sentence. Could you please explain this *

thanks !!!


ADD REPLYlink written 3.8 years ago by basalganglia30
3.8 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

a simple awk '$6<0.02' a.txt would do, since the "." would be included in that filter.

ADD COMMENTlink written 3.8 years ago by Jorge Amigo11k
