Delete some INFO header lines and columns data
1
0
Entering edit mode
2.2 years ago

Dear all

I have two vcf files I wish to concatenate, but they have different INFO tags. I would like to match the data in the two files by deleting the extra info in one of them.

I have tried "bcftools annotate", but I found it returns only the header, without the data. I have also read "Remove columns from the VCF file using vcftools" which is very promising...

awk -v OFS="\t" '!/##/ {$1=$2=$3=$4="";print}' test.vcf|sed 's/^\s\+//g'

Along the lines of this bash order, I wonder whether is it possible to discard specific INFO columns (lets say INFO/RPBZ, INFO/MQBZ and INFO/BQBZ) and the corresponding header lines. Does OFS admit more than one separator at once? If I could use OFS="\t" plus OFS=";", the above bash line could be easily modified to select the precise columns. In that case, how do I match the correct header lines?

Thanks in advance.

Pablo

vcf bash vcftools • 3.1k views
ADD COMMENT
0
Entering edit mode

Ooops.. Eventually I realized there is no need for a complex OFS and it can be accomplish with OFS=";". Sorry to bother your time.

    awk -v OFS=";" '!/##/ {$5=$6=$7=$8=$9=""}1' $file |sed 's/^\s\+//g' > $PRETWEAKBIFILE

bcftools annotate -x INFO/RPBZ,INFO/MQBZ,INFO/MQSBZ,INFO/BQBZ,INFO/SCBZ -O z -o $HEADFILE --threads 6 $file
bcftools reheader -h $HEADFILE -o TWEAKBIFILE --threads 6 $PRETWEAKBIFILE
bcftools index -t $TWEAKBIFILE

Pablo

ADD REPLY
0
Entering edit mode

this is useless, bcftools annotate already removes INFO from the header and from the variants.

ADD REPLY
0
Entering edit mode

Sorry Pierre... If that is ok with other users, fine. I double checked my "bcftools annotate infile" had the header and the variants but the outfile had only the header. Pablo

ADD REPLY
1
Entering edit mode
bcftools view rotavirus_rf.vcf.gz | grep VDB -m2
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
RF01    970 .   A   C   48.6696 .   DP=36;VDB=0.693968;SGB=10.3229;RPB=0.658863;MQB=1;MQSB=1;BQB=0.572843;MQ0F=0;ICB=0.425;HOB=0.32;AC=2;AN=10;DP4=19,7,3,5;MQ=60   GT:PL   0/0:0,9,47  0/0:0,18,73 0/0:0,18,73 0/0:0,33,116    1/1:95,24,0

remove 'INFO/VDB', the header INFO ID=VDB was removed. there is no line containing VDB but a reminder of the bcftools annotate command.

$ bcftools annotate -x 'INFO/VDB' rotavirus_rf.vcf.gz | grep VDB -m2
##bcftools_annotateCommand=annotate -x INFO/VDB rotavirus_rf.vcf.gz; Date=Sun Jan 30 16:35:35 2022

check the variant at 970 is still here but INFO/VDB was removed.

$ bcftools annotate -x 'INFO/VDB' rotavirus_rf.vcf.gz | awk '$2==970'
RF01    970 .   A   C   48.6696 .   DP=36;SGB=10.3229;RPB=0.658863;MQB=1;MQSB=1;BQB=0.572843;MQ0F=0;ICB=0.425;HOB=0.32;AC=2;AN=10;DP4=19,7,3,5;MQ=60    GT:PL   0/0:0,9,47  0/0:0,18,73 0/0:0,18,73 0/0:0,33,116    1/1:95,24,0
ADD REPLY
0
Entering edit mode

Thank you Pierre. I'll look forward my mistake(s).

Should I close the question? I have not found the right place to do it

ADD REPLY
1
Entering edit mode

There is no need to close the question. If @Pierre's answer below helped you can accept that (green checkmark) that provides closure to this thread. Do not delete the thread.

ADD REPLY
0
Entering edit mode

Thanks to everybody sharing knowledge!!

ADD REPLY
3
Entering edit mode
2.2 years ago

use bcftools annotate -x 'INFO/RPBZ,INFO/MQBZ,INFO/BQBZ' in.vcf

ADD COMMENT
0
Entering edit mode

Thank you very much Pierre. In fact, that was my first option, with the -o option. But it returns me an outfile with just the header. Pablo

ADD REPLY

Login before adding your answer.

Traffic: 2946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6