Question: grep command on VCF
0
gravatar for jon.klonowski
9 months ago by
jon.klonowski70 wrote:

Summery: my grep command is not working properly when applied to a vcf but worked fine on a dummy test file. The grep command is putting too many records..

I am trying to pull all records with 1000 Genomes AF < 0.5 from a vcf. the vcf is annotated and for each SNP the AF from 1KGenomes is under an info column "controls_AF_popmax"

This is the surrounding area from an entry:

;non_cancer_AF_popmax=0.0004;controls_AF_popmax=0.0003;MCAP13=.;

This is my grep:

zless my_file.vcf.gz  | grep -v '^#' | grep ';controls_AF_popmax=0\.0[0-4]\|;controls_AF_popmax=\.;' > output.txt

It is pulling records where the AF value is between 1 and 2.338e-05 and the "."

I tried a test .txt and the function worked well:

fadsfad;controls_AF_popmax=0.0003;adsfadsf

dsafdsaf;controls_AF_popmax=.;fadsf

fdasfasd;controls_AF_popmax=0.1;fasdf

where the result is:

fadsfad;controls_AF_popmax=0.0003;adsfadsf

dsafdsaf;controls_AF_popmax=.;fadsf

unix bcf grep vcf • 2.0k views
ADD COMMENTlink modified 9 months ago • written 9 months ago by jon.klonowski70

I don't see the problem here, where does it fail?

ADD REPLYlink written 9 months ago by Asaf6.8k

when i reintroduce the header and query the file, the controls_AF_popmax are all possible numbers between 1-NA

ADD REPLYlink written 9 months ago by jon.klonowski70

I strongly recommend using existing programs for filtering vcf files, like bcftools or SnpSift.

bcftools view -i "controls_AF_popmax<0.5" input.vcf
ADD REPLYlink modified 9 months ago • written 9 months ago by finswimmer13k

This command is not functioning properly for me.

We are talking about it on: BCF Tools Filter on 1000Genomes Annotation

ADD REPLYlink modified 9 months ago • written 9 months ago by jon.klonowski70
1
gravatar for jon.klonowski
9 months ago by
jon.klonowski70 wrote:

The function works. The issue was something with the VCF file: unknown reason.

bcftools view -i "controls_AF_popmax<0.5" input.vcf

is the best answer if the VCF is formated corrected.

Future people:

Make sure your info field is formatted correctly. here I wanted the format liek this:

INFO=<id=controls_af_popmax,number=.,type=float,description="controls_af_popmax annotation="" provided="" by="" annovar"="">

Previously the type was string

ADD COMMENTlink written 9 months ago by jon.klonowski70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1248 users visited in the last hour