Filtering multiple-samples VCF by genotype with GATK
1
0
Entering edit mode
6 weeks ago
Timotheus ▴ 20

Hello,

I'm trying to filter a VCF file with two samples, let's call them 'sample1' and 'sample2', using GATK. I'd like to retain only sites homozygous in sample1 and heterozygous in sample2. To flag the relevant SNPs, I tried the following GATK command:

gatk VariantFiltration \
-V /path/master.vcf \
-O /path/filtered.vcf \
--genotype-filter-expression "vc.getGenotype("sample1").isHomRef() && vc.getGenotype("sample2").isHet()" \
--genotype-filter-name "HomInSample1_HetInSample2"

I got a series of warnings (see below), and non of the SNPs were flagged:

17:03:23.742 WARN  JexlEngine - ![15,26]: 'vc.getGenotype(sample1).isHomRef() && vc.getGenotype(sample2).isHet();' undefined variable sample1

Would anyone be able to fix it? I cannot find an example of how to do such operations in the documentation.

GATK SNPs • 205 views
ADD COMMENT
3
Entering edit mode
6 weeks ago
LChart 810

I think you either need to escape the inner quotes, or use single outer-quotes and double inner quotes

'vc.getGenotype("sample1").isHomRef() && vc.getGenotype("sample2").isHet()'

as the JEXL is trying to look for a variable named sample1 as opposed to using the literal string "sample1"

ADD COMMENT
0
Entering edit mode

Yes, you're right, thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6