I've looked at
bedtools and GATK
SelectVariants, but I don't see how I can address my requirement - I hope the community can help.
I want to subset a VCF file based on 4 attributes:
ALT. I have 2 VCF files, and I wish to exclude all entries from
VCF1 where it matches those 4 values in
VCF2. Most of the above-mentioned tools work only on
POS, even if they accept a VCF file to filter by. There is no way they compare an exact match to
ALT allele, even if both VCF inputs were processed to split all multi-allelic sites.
The closest I can get to is by using
bcftools annotate, and by copying over an
INFO attribute (say,
INFO/AC) with a new name (say,
INFO/DUMMY_AC) so I can filter by that new name. The manual on
bcftools annotate states:
When REF and ALT are present, only matching VCF records will be annotated.
which works for me when my
ALT alleles are split, but does not help me filter, only mark them.
Is there any subset tool that will help me compare by custom attributes or do I have to write my own script for it?