Hi there, I'm working on a joint call-set of 47 VCFs which I will be merging with GLNexus
. Now, I've done this before but, for some reason, since I've added 2 extra samples to the original 45 – total 47 – there have been few issues.
The original 45 samples are from the SGDP called with UnifiedCaller
the 2 extra are archaic Neanderthal and Denisova, which have been called with the same pipeline on hs37d5. I happened to have the 45 samples sorted since I needed this for another task, so I thought to move on and sort the archaic as well just to speed-up the merging.
Unfortunately, upon attempting to sort I've been presented with the following:
Writing to 3.sgdp_hg19/arch_001/tempa2tnzI -> this is Neanderthal
[W::bcf_hrec_check] Invalid tag name: "1000gALT"
[W::vcf_parse_info] INFO '.' is not defined in the header, assuming Type=String
[W::bcf_hrec_check] Invalid tag name: "."
Error encountered while parsing the input at 1:121387974
Cleaning
Writing to 3.sgdp_hg19/arch_002/tempx8KiuF -> this is Denisova
[W::bcf_hrec_check] Invalid tag name: "1000gALT"
[W::vcf_parse_info] INFO '.' is not defined in the header, assuming Type=String
[W::bcf_hrec_check] Invalid tag name: "."
Error encountered while parsing the input at 1:2590169
Cleaning
I double-checked that 1000gALT
TAG in the header and in the body's entries and it appear to be present which leave me perplexed about 'bcftools' rising that issue at first.
Second, I don't know/understand to what the INFO '.' is not defined in the header
and the Invalid tag name: "."
refer to... I checked the input lines 1:121387974 and 1:2590169 but they look fine to me...
Is there any way to prevent this issues that stop me from sorting the two files so that I can then merge the 47 samples? I was looking into bcftools annotate
but I tested it on Neanderthal and got this:
[W::bcf_hrec_check] Invalid tag name: "1000gALT"
[W::bcf_hrec_check] Invalid tag name: "1000gALT"
Warning: The tag "." not defined in the header
[W::vcf_parse_info] INFO '.' is not defined in the header, assuming Type=String
[W::bcf_hrec_check] Invalid tag name: "."
Encountered an error, cannot proceed. Please check the error output above.
If feeling adventurous, use the --force option. (At your own risk!)
which always goes back to the same problem. To be noted, that these two archaic VCFs had already and issue in the FORMAT filed which I had to fix by reheading
them; however, these two are beyond my understanding. If anyone has any clue, I'll be very happy to try and figure out what's going on, thanks in advance!
P. S. simply merging the files seems pointless, as the process aborts for the same exact issues
Show us the content of the INFO column at the problematic positions.
My bad I switched positions between the two files when looking the first time. They do indeed look abnormal but how can I fix it? Below the Neanderthal:
1 121387974 . C . . . .;CpG GT:A:C:G:T:IR ./.:0,0:0,0:0,0:1,0:0
and Denisova positions of interest:
1 2590169 . C . . . .;RM;TS=HPOM;CAnc=C;OAnc=-;rMac=-;mSC=0.300;Map20=0.25 GT:A:C:G:T:IR ./.:0,0:0,1:0,0:0,0:0
Also, this issue could be pervasive... how can I fix it file-wise? Thanks in advance @Pierre Lindenbaum
@Pierre Lindenbaum as I don't know the meaning of
.;
and I've seen there are many in both files, I simply removed them withgrep -v
.I believe there may be a way with
bcftools
but I'm not sure how, also I can't find any relevant information on the GATK guide about VCF files to add this detail to my headers as in the previous instance.Still, I can't understand the
Invalid tag name: "1000gALT"
line; if I missed something please let me know, thanks!the following line is missing in the VCF header:
OR/AND
the syntax of the TAG is wrong (tag starting with a number ?)
@Pierre Lindenbaum I see. I looked up and, although present, the TAG goes like this in both files:
##INFO=<ID=1000gALT,Number=0,Type=Flag,Description="Alternative allele referred to by 1000G">
Looking around a bit I think the key is to set the
Number=
to0
instead of1
; sorry but until now I wasn't aware of the difference between the two. EDIT: probably I should also change theType
toFlag
I'm not sure it's a problem.
ah yes, my bad ! (fixed)