Hi there,
I'm working with a VCF file of this type, see image below
However, I need a tool to handle this VCF so that I can genotype a particular human individual. The problem is that this tool doesn't handle the '.' for the individuals I have within my pangenome... I need those to be '.|.' instead.
I've tried this awk
command
awk 'FNR > 719 {sub(/[[:space:]].[[:space:]]/, ".|."); print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20}' pangenome_ref_guided_GRCh38.vcf > temp.vcf
but what it does is simply substituting the first '.' — green arrow, leaving the following ones untouched — red arrows. This behaviour is actually the opposite of what I need, as the '.' which need to be changed to a '.|.' are actually the ones in the section within red lines.
Thanks in advance, and sorry but I'm new to the use of awk
and to handling VCF in general.
Hey @Pierre Lindenbaum,
Thanks a lot! It worked perfectly. If I may, could I ask you a questions about the command you wrote?
What the
/^#/
does specifically, in my book it says it uses an "alternative form" for certain control letters; however, I'm not quite sure if that's the case here. Thanks again this approach also kept the first lines for the file, which I would have eventually need to add back.