formatting VCF file for analyses
1
0
Entering edit mode
20 months ago
Matteo Ungaro ▴ 100

Hi there,

I'm working with a VCF file of this type, see image below

file structure

However, I need a tool to handle this VCF so that I can genotype a particular human individual. The problem is that this tool doesn't handle the '.' for the individuals I have within my pangenome... I need those to be '.|.' instead.

I've tried this awk command

awk 'FNR > 719 {sub(/[[:space:]].[[:space:]]/, ".|."); print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20}' pangenome_ref_guided_GRCh38.vcf > temp.vcf

but what it does is simply substituting the first '.' — green arrow, leaving the following ones untouched — red arrows. This behaviour is actually the opposite of what I need, as the '.' which need to be changed to a '.|.' are actually the ones in the section within red lines.

Thanks in advance, and sorry but I'm new to the use of awk and to handling VCF in general.

VCF awk • 627 views
ADD COMMENT
2
Entering edit mode
20 months ago

not tested

awk -F '\t' '/^#/ {print;next;} {OFS="\t";for(i=10;i<=NF;i++){ if($i==".") $i=".|."; } print;}' in.vcf
ADD COMMENT
0
Entering edit mode

Hey @Pierre Lindenbaum,

Thanks a lot! It worked perfectly. If I may, could I ask you a questions about the command you wrote?

What the /^#/ does specifically, in my book it says it uses an "alternative form" for certain control letters; however, I'm not quite sure if that's the case here. Thanks again this approach also kept the first lines for the file, which I would have eventually need to add back.

ADD REPLY
1
Entering edit mode
 '/^#/     {                 // when a line starts with the regular expression 'start-with' hash
           print;            // print the whole line
           next;             // skip the other patterns, read the next line
            }
ADD REPLY

Login before adding your answer.

Traffic: 2307 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6