I have a multisample vcf and want to extract the IDs of those with rare impactful SNPs with at least one alternative allele at the a loci within the region of interest.
I am running the following on a normalised multi-sample vcf which has been annotated with VEP:
bcftools view -r REGION OF INTEREST | filter_vep -filter 'IMPACT is MODERATE or IMPACT is HIGH' | filter_vep -filter MAX_AF <0.01 or not MAX_AF'' | bcftools view -i 'GT="alt"' | bcftools query -f "'[%CHROM:%PO %SAMPLE %GT\n]' >output.txt
I work in an airlocked environment with patient data so can't share the VCF I am using but it is a standard multi-sample vcf created by merging using bcftools.
I get an output with a variant and ID per line irrespective of the genotype e.g.:
chr1:123:456 sampleID123 ./. chr1:123:456 sampleID234 ./. chr1:123:456 sampleID156 0/1. chr1:123:456 sampleID123 ./.
I would like to just get the sample IDs with at least one alt genotype.