How to add INFO field in VCFgz file
2
0
Entering edit mode
3.6 years ago
grkhan117 • 0

Hi I have a vcfgz file having the following header

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  FORMAT  Sample1        Sample2        Sample3

I want to split it by samples but by using various tools it gives error that there is no INFO field. How can i add the INFO field in vcfgz file?

vcf next-gen • 1.3k views
ADD COMMENT
1
Entering edit mode
3.6 years ago

not tested.

awk '/^##/{print;next;} /^#CHROM/ {for(i=1;i<=NF;i++) {printf("%s%s",(i>1?"\t":""),$i);if(i==7) printf("\tINFO");}printf("\n");next;} {for(i=1;i<=NF;i++) {printf("%s%s",(i>1?"\t":""),$i);if(i==7) printf("\t.");}printf("\n");}' input.vcf
ADD COMMENT
0
Entering edit mode

it gives error: awk: read error (Bad address)

ADD REPLY
0
Entering edit mode

read error (Bad address)

it's not a problem with the awk script.

ADD REPLY
0
Entering edit mode
3.6 years ago

This should work, although you'll still end up with INFO-less vcf files:

for i in 9 10 11; do
 sample=$(zgrep -m1 ^#CHROM multisample.vcf.gz | cut -f$i)
 zgrep ^## multisample.vcf.gz > $sample.vcf
 zgrep -v ^## multisample.vcf.gz | cut -f1-8,$i >> $sample.vcf
done
ADD COMMENT
0
Entering edit mode

how to split if i have a list of samples and just want to get those samples i have in a list?

ADD REPLY
0
Entering edit mode

Not tested:

iMaxSample=$(zgrep -m1 ^#CHROM multisample.vcf.gz | wc -w)
zgrep ^## multisample.vcf.gz > header.txt
for i in $(seq 1 $iMaxSample); do
 sample=$(zgrep -m1 ^#CHROM multisample.vcf.gz | cut -f$i)
 if grep -w $sample list.of.samples.txt &>/dev/null; then
  { cat header.txt; zgrep -v ^## multisample.vcf.gz | cut -f1-8,$i; } > $sample.vcf
 fi
done

Again, if you start with a malformed multi-sample vcf, this snippet will generate individual malformed vcf files. I would strongly recommend to repair the original vcf file format (maybe forcing an empty INFO column?) instead of using this code.

ADD REPLY

Login before adding your answer.

Traffic: 2743 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6