Filtering VCF file by INFO flag
3
1
Entering edit mode
7.1 years ago
cl10101 ▴ 80

I am trying to filter variants with dbSNP annotation. In my VCF file this information is contained in INFO column like this:

Y       59003592        .       A       G       .       .       NS=1;AN=1;AC=1;CGA_XR=dbsnp.96|rs2140187;CGA_SDO=2      GT:PS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP        1:.:PASS:2035:2035,.:2035,.:44,.:-2035,0:-44,0:157:146,.:11

Information from header:

##INFO=<ID=CGA_XR,Number=A,Type=String,Description="Per-ALT external database reference (dbSNP, COSMIC, etc)">

According to vcftools documentation there is option to filter sites with specific INFO flag (--keep-INFO). I've tried to use this:

vcftools --vcf file.vcf --out output --keep-INFO CGA_XR

but without success:

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --vcf file.vcf
    --out output
    --keep-INFO CGA_XR

After filtering, kept 1 out of 1 Individuals
Error: Using INFO flag filtering on non flag type CGA_XR will not work correctly.

What is the proper usage of this function?

vcftools vcf • 8.0k views
ADD COMMENT
2
Entering edit mode
7.1 years ago
bcftools view -i 'CGA_XR ~"dbsnp"' file.vcf
ADD COMMENT
1
Entering edit mode
7.1 years ago

vcffilterjs

 java -jar dist/vcffilterjs.jar -e 'variant.getAttributeAsString("CGA_XR","").startsWith("dbsnp")' file.vcf
ADD COMMENT
0
Entering edit mode
7.1 years ago

You could try:

cat <(grep '^#' yourfile.vcf) <(grep '|rs' yourfile.vcf) > outputfile.vcf

The first grep gets the header, the second grep every line containing a |rs pattern, which is hopefully specific enough. The result of both greps is cat together to create the output file.

ADD COMMENT
3
Entering edit mode
grep -E "(^#|\|rs)" in.vcf > out.vcf
ADD REPLY
0
Entering edit mode

That's a delicious solution!

ADD REPLY

Login before adding your answer.

Traffic: 2639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6