Selecting Variants based on annotations in a VCF file
3
0
Entering edit mode
8.1 years ago
petersam • 0

I have a VCF file generated with GATK and annotated with SNPeff. There is an annotation field described as

> ##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length |  AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO'"

An example of a variant:

AC=5;AF=0.200;AN=10;ANN=C|intron_variant|MODIFIER|GAPDH|4214|transcript|GAPDH_transcript|sim4cc|n.1964-49T>C||||||

I have been using GATK SelectVariants to generate variant lists based on sample genotype and quality metrics but would also like to isolate all variants with different Annotation Impact. In the example above, I would want to get all variants described as "MODIFIER" I am not getting the format correct for the JEXL commands. Can someone suggest a command?

Thank you, S

vcf snpeff gatk variants • 4.9k views
ADD COMMENT
0
Entering edit mode

Just curious--do you want to produce a valid VCF as output, or just find the records of interest?

ADD REPLY
3
Entering edit mode
8.1 years ago
Brice Sarver ★ 3.8k

The JEXL stuff in the GATK is wonky at best; I avoid it whenever possible. It is much faster to parse your VCF using grep than it is to use SelectVariants in this case. Grep the lines that have MODIFIER and append them to your VCF. You'll probably need to do standard VCF manipulation (sorting, etc.) after the fact. Something like:

grep MODIFIER yourfile.vcf > my_results.vcf

You will, of course, need to modify your regex to avoid the lines with MODIFIER in the header.

ADD COMMENT
0
Entering edit mode

Thank you. That was incredibly easy.

ADD REPLY
2
Entering edit mode
8.1 years ago
Mitch Bekritsky ★ 1.3k

bcftools view is your friend here. I'd suggest reading the documentation on the linked page for more details, but something along the lines of

bcftools view -i "ANN~MODIFIER" -o modifier.vcf -Ov init.vcf

should get you a VCF file with the variants you're looking for.

ADD COMMENT
1
Entering edit mode
8.1 years ago

using vcffilterjs and the following script: https://github.com/lindenb/jvarkit/wiki/VCFFilterJS

function accept(v) {
var ann =   v.getAttributeAsList("ANN");
for(var i in ann)
    {
    var tokens = ann[i].split(/\|/);
    if( tokens.length>2 && tokens[2] == "MODIFIER" ) return true;
    }
return false;
}
accept(variant);
ADD COMMENT

Login before adding your answer.

Traffic: 882 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6