I have a 928 non related vcd sample file. It has been annotated with appropriate RSID's. How do I query the file for the sample along with the correspond ID? Analysis requires the sample name the ref/alt allele and the RSID number. If I use vcd-subset, I get them sample but not the ID. Thanks for any directions.
gunzip -c input.vcf.gz | awk -F '\t' '($0 ~ /^#/ || $3=="rs1234")'
or GATK selectvariants with
--keepIDs / -IDs
List of variant IDs to select
If a file containing a list of IDs is provided to this argument, the tool will only select variants whose ID field is present in this list of IDs. The matching is done by exact string matching. The expected file format is simply plain text with one ID per line.