How to add FORMAT/TAG = "." to all samples in a vcf file via bcftools?
1
1
Entering edit mode
5 weeks ago
Марта • 0

Hello, all. I'm trying to add FORMAT/TAG annotation to all samples in my vcf file and I want it to be a missing value (".") or 0 for all of them. Is there a way to do this with bcftools?

vcf bcftools • 445 views
ADD COMMENT
1
Entering edit mode
5 weeks ago

using vcffilterjdk https://jvarkit.readthedocs.io/en/latest/VcfFilterJdk/

bcftools view in.vcf.gz |\
awk '/^#CHROM/ {printf("##FORMAT=<ID=TAG,Number=1,Type=Integer,Description=\"x\">\n");} {print}' |\
java -jar ${JVARKIT_DIST}/jvarkit.jar vcffilterjdk -e 'return new VariantContextBuilder(variant).genotypes(variant.getGenotypes().stream().map(G->new GenotypeBuilder(G).attribute("TAG",0).make()).collect(Collectors.toList())).make();'
ADD COMMENT
0
Entering edit mode

Thank you for the answer! This works, although I noticed that if add a second TAG to the edited vcf with the same method it replaces the first one I added. Could you help me figure out how to add more than one TAG?

ADD REPLY
0
Entering edit mode

Could you help me figure out how to add more than one TAG?

run the script twice by changing the name "TAG" to "TAG2" ...

ADD REPLY
0
Entering edit mode

I did, I ran

bcftools view in.vcf.gz |\
awk '/^#CHROM/ {printf("##FORMAT=<ID=TAG1,Number=1,Type=Integer,Description=\"x\">\n");} {print}' |\
java -jar jvarkit.jar vcffilterjdk -e 'return new VariantContextBuilder(variant).genotypes( variant.getGenotypes().stream().map(G->new GenotypeBuilder(G).attribute("TAG1",0).make()).collect(Collectors.toList()) ).make();' > out1.vcf

and then

bcftools view out1.vcf.gz |\
awk '/^#CHROM/ {printf("##FORMAT=<ID=TAG2,Number=1,Type=Integer,Description=\"x\">\n");} {print}' |\
java -jar jvarkit.jar vcffilterjdk -e 'return new VariantContextBuilder(variant).genotypes( variant.getGenotypes().stream().map(G->new GenotypeBuilder(G).attribute("TAG2",0).make()).collect(Collectors.toList()) ).make();' > out2.vcf

In the resulting out2.vcf, both ##FORMAT=<ID=TAG1..> and ##FORMAT=<ID=TAG2..> are present, but all of the samples only have TAG2 in their FORMAT fields. I'm not familiar with Java so I'm not sure what can be done to correct this. Could you help please?

ADD REPLY
0
Entering edit mode

works on my machine:

 gunzip -c in.vcf.gz | awk '/^#CHROM/ {printf("##FORMAT=<ID=TAG1,Number=1,Type=Integer,Description=\"x\">\n");printf("##FORMAT=<ID=TAG2,Number=1,Type=Integer,Description=\"x\">\n");} {print}' | java -jar jvarkit.jar vcffilterjdk -e 'return new VariantContextBuilder(variant).genotypes( variant.getGenotypes().stream().map(G->new GenotypeBuilder(G).attribute("TAG1",0).make()).collect(Collectors.toList()) ).make();' | java -jar jvarkit.jar vcffilterjdk -e 'return new VariantContextBuilder(variant).genotypes( variant.getGenotypes().stream().map(G->new GenotypeBuilder(G).attribute("TAG2",0).make()).collect(Collectors.toList()) ).make();'  | grep -v "##" | cut -f 9- | head

FORMAT  S1  S2  S3  S4  S5
GT:PL:TAG1:TAG2 0/0:0,9,47:0:0  0/0:0,18,73:0:0 0/0:0,18,73:0:0 0/0:0,33,116:0:0    1/1:95,24,0:0:0
GT:PL:TAG1:TAG2 0/0:0,15,57:0:0 0/1:31,0,5:0:0  0/1:31,0,5:0:0  0/0:0,9,42:0:0  0/0:0,24,69:0:0
GT:PL:TAG1:TAG2 0/0:0,33,122:0:0    0/0:0,39,135:0:0    0/0:0,39,135:0:0    1/1:100,30,0:0:0    0/0:0,27,109:0:0
GT:PL:TAG1:TAG2 0/1:37,0,50:0:0 0/0:0,22,116:0:0    0/0:0,22,116:0:0    0/0:0,21,94:0:0 0/0:0,12,62:0:0
GT:PL:TAG1:TAG2 0/0:0,18,83:0:0 0/1:24,0,40:0:0 0/1:24,0,40:0:0 0/0:0,27,111:0:0    0/0:0,10,78:0:0
GT:PL:TAG1:TAG2 0/1:70,0,159:0:0    0/0:0,15,225:0:0    0/0:0,15,225:0:0    0/0:0,27,231:0:0    0/0:0,27,168:0:0
GT:PL:TAG1:TAG2 0/0:0,18,84:0:0 1/1:111,27,0:0:0    1/1:111,27,0:0:0    0/0:0,21,92:0:0 0/0:0,45,141:0:0
GT:PL:TAG1:TAG2 0/0:0,27,109:0:0    0/0:0,24,95:0:0 0/0:0,24,95:0:0 1/1:109,27,0:0:0    0/0:0,42,140:0:0
GT:PL:TAG1:TAG2 0/0:0,33,124:0:0    0/0:0,30,117:0:0    0/0:0,30,117:0:0    0/0:0,27,111:0:0    1/1:111,33,0:0:0
ADD REPLY

Login before adding your answer.

Traffic: 2550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6