I have multi-sample vcf file and an example variant is shown below:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 03-071 04-051 04-071 06-044 07-085 10-009 chr1 6526093 . T C 197.77 . AC1=1;AC=1;AF1=0.5 GT:GQ:DP:PL:AD 0/1 1/1 0/0 1/1 1/1 0/1
For each variant i would need to retrieve the sample names based on genotype. If the genotype is "0/1" it should output the first 5 columns and the sample names in the 6th column.
chr1 6526093 . T C 03-071,10-009
If the genotype is "1/1" it should output:
chr1 6526093 . T C 04-051,06-044,07-085
The original file has >500 sample and i would need to get the output in the above format. Are there any tools which can do this to some extent and further tweaking to get the desire output format?