Error occurence after merging files with bcftools: wrong number of fields ?
1
1
Entering edit mode
2.1 years ago
Lukáš ▴ 30

I have multiple vcf of CASES and CONTROLS variations annotated by VEP, SNPEff, SnpSift.

first pair vcf -> only variations| CASES and CONTROLS
second pair vcf -> variations + SnpEff | CASES and CONTROLS
third pair vcf-> variations + SnpEff+VEP+SnpSIFT| CASES and CONTROLS | at so on

Because it have non-overlapping samples, i tried to merge corresponding vcf of CASE and CONTROLS into one file with bcftools merge.

bgzip -c file.vcf > file.vcf.gz
tabix -p vcf file.vcf.gz
bcftools merge -o merged.vcf.gz -Oz file.vcf.gz file_2.vcf.gz

Unfortunatelly in files with SnpSift annotations i got this error:

Error at chr1:20724637: wrong number of fields in dbNSFP_ada_score?

Please anybody knows how to solved this error or what does it mean?

vcf • 1.2k views
ADD COMMENT
0
Entering edit mode

what is the output of the following cmds:

bcftools view --header-only  file.vcf.gz | grep dbNSFP_ada_score
bcftools view --header-only  file_2.vcf.gz | grep dbNSFP_ada_score
bcftools view -G  file.vcf.gz "chr1:20724637"
bcftools view -G file_2.vcf.gz "chr1:20724637"
ADD REPLY
0
Entering edit mode

grep header:

INFO=<ID=dbNSFP_ada_score,Number=A,Type=String,Description="Field 'ada_score' from dbNSFP">

INFO=<ID=dbNSFP_ada_score,Number=A,Type=String,Description="Field 'ada_score' from dbNSFP">

-G on chr1:

CHROM POS ID REF ALT QUAL FILTER INFO

chr1 20724637 rs1280663306;rs1445195248 TGGGAGGGAGGGAGAGAGGT TGGGAGGGAGAGAGGT,TGGGAGGGAGGGAGGGAGGT,TGGGAGGGAGGGAGAGAGGG,TGGGGGGGAGGGAGAGAGGT,TGGGTGGGAGGGAGAGAGGT,TGGGAGAGAGGGAGAGAGGT 2.42927 . ANN=TGGGGGGGAGGGAGAGAGGT|splice_region_variant&intron_variant|LOW|SH2D5|SH2D5

...SNV|1||||||||||||||||||||||||||||||||||||||||||||;dbNSFP_ada_score=1.43020658327008E-4,0.00157457299236331,5.16496450143413E-5;dbNSFP_rf_score=0.016,0.054,0.0;dbNSFP_PHRED=5.632,0.077,6.365,7.459,2.716

CHROM POS ID REF ALT QUAL FILTER INFO

chr1 20724637 rs1280663306;rs1445195248 TGGGAGGGAGGGAGAGAGGT TGGGAGGGAGAGAGGT,TGGGAGGGAGGGAGGGAGGT,TGGGAGGGAGGGAGAGAGGG,TGGGGGGGAGGGAGAGAGGT,TGGGTGGGAGGGAGAGAGGT,TGGGAGAGAGGGAGAGAGGT 2.42927 . ANN=TGGGGGGGAGGGAGAGAGGT|splice_region_variant&intron_variant|LOW|SH2D5|SH2D5|transcript|NM_001103161.2|protein_coding|5/9|..

..||||||||||||||||||||||||||;dbNSFP_ada_score=1.43020658327008E-4,0.00157457299236331,5.16496450143413E-5;dbNSFP_rf_score=0.016,0.054,0.0;dbNSFP_PHRED=5.632,0.077,6.365,7.459,2.716

I had to cut it because it wouldnt be readeable and i think it would be too long for response. I ll try do add it into question.

ADD REPLY
0
Entering edit mode

Unfortunately content is over limit. Still only one differrence i have there dbNSFP annotations from others is separated by ; instead |

My last idea is that it could be only some difference beetween CASE and CONTROLS files.

ADD REPLY
2
Entering edit mode
2.1 years ago

your vcf is badly annotated.

there is a VCF with 6 ALT alleles :

chr1 20724637 rs1280663306;rs1445195248 TGGGAGGGAGGGAGAGAGGT TGGGAGGGAGAGAGGT,TGGGAGGGAGGGAGGGAGGT,TGGGAGGGAGGGAGAGAGGG,TGGGGGGGAGGGAGAGAGGT,TGGGTGGGAGGGAGAGAGGT,TGGGAGAGAGGGAGAGAGGT

and the annotation for dbNSFP_ada_score is INFO=<ID=dbNSFP_ada_score,Number=A,Ty..:

so Number=A implies that there MUST be one value for each ALT allele.

but there is only 3 values for this variant: 0.016,0.054,0.0 while there is 6 ALT alleles.

ADD COMMENT
0
Entering edit mode

Thank you very much. So it is badly annotated. But it only goes for dbNSFP scores anotated by SnpSift. So do i have to change whole annotation of vcf file, or only the dbNSFP part of it?

I am sorry if its trivial but i am only familiar with annotation concept.

Once again thank you so much for your answer

ADD REPLY
1
Entering edit mode

I would remove the annotation of the two vcf files (bcftools annotat -x '...'), merge the files and re-annotate

ADD REPLY

Login before adding your answer.

Traffic: 2196 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6