Filtering multiple-samples VCF by genotype with GATK
1
0
Entering edit mode
6 weeks ago
Timotheus ▴ 20

Hello,

I'm trying to filter a VCF file with two samples, let's call them 'sample1' and 'sample2', using GATK. I'd like to retain only sites homozygous in sample1 and heterozygous in sample2. To flag the relevant SNPs, I tried the following GATK command:

gatk VariantFiltration \
-V /path/master.vcf \
-O /path/filtered.vcf \
--genotype-filter-expression "vc.getGenotype("sample1").isHomRef() && vc.getGenotype("sample2").isHet()" \
--genotype-filter-name "HomInSample1_HetInSample2"


I got a series of warnings (see below), and non of the SNPs were flagged:

17:03:23.742 WARN  JexlEngine - ![15,26]: 'vc.getGenotype(sample1).isHomRef() && vc.getGenotype(sample2).isHet();' undefined variable sample1


Would anyone be able to fix it? I cannot find an example of how to do such operations in the documentation.

GATK SNPs • 205 views
3
Entering edit mode
6 weeks ago
LChart 810

I think you either need to escape the inner quotes, or use single outer-quotes and double inner quotes

'vc.getGenotype("sample1").isHomRef() && vc.getGenotype("sample2").isHet()'

as the JEXL is trying to look for a variable named sample1 as opposed to using the literal string "sample1"

0
Entering edit mode

Yes, you're right, thank you very much!