Hello,
I'm merging vcf files into one with bcftools merge. Here's an example of an individual VCF:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 01GO_S2
chr1 14604 N/A A G . PASS . GT:GQ:DP:SR:VR:VA:SB:ABQ:AMQ 0/1:71.03:35:14.29:14.29:0:5.71:37.00:48.20
chr1 14610 N/A T C . PASS . GT:GQ:DP:SR:VR:VA:SB:ABQ:AMQ 0/1:88.73:43:13.95:13.95:0:4.65:37.00:50.67
COMMAND:
bcftools merge -0 --missing-to-ref a.vcf.gz b.vcf.gz c.vcf.gz -o d.vcf
The merged vcf looks like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 01GO_S2 01MAR_S1 01PB_S4
chr1 14574 N/A A G . PASS . GT:GQ:DP:SR:VR:VA:SB:ABQ:AMQ 0/0:.:.:.:.:.:.:.:. 0/1:80.46:14:28.57:28.57:0:0:34:33 0/0:.:.:.:.:.:.:.:.
chr1 14590 N/A G A . PASS . GT:GQ:DP:SR:VR:VA:SB:ABQ:AMQ 0/0:.:.:.:.:.:.:.:. 0/1:101.13:25:20:20:0:8:37:38.6 0/1:62.74:10:30:30:0:0:37:57
When trying to use tapes, I receive the following error:
File "/home/bruno/.local/lib/python3.10/site-packages/vcf_parser/utils/format_variant.py", line 73, in format_variant
raise SyntaxError("The INFO field {0} is not specified in vcf"\
SyntaxError: The INFO field . is not specified in vcf header. chr1 14574 N/A A G . PASS . GT:GQ:DP:SR:VR:VA:SB:ABQ:AMQ 0/0:.:.:.:.:.:.:.:. 0/1:80.46:14:28.57:28.57:0:0:34:33 0/0:.:.:.:.:.:.:.:.
It seems that tapes (and other scripts that uses vcf_parser library of python) starts to recognize the '.' in INFO column as information, not as missing. But, the individual vcf didn't have any information in the INFO filed also. Tried manipulating the header to no avail. It seems to be a simple issue (My last resort is to manipulate the data by making a fake info field in all rows and declare it in the header).
Can someone help? Thanks in advance
Welcome can you please paste the lines of the vcf as text into the question and format them as code, rather than as pictures. Thank you.
For someone who happens to get here. I managed to 'solve' to problem.
I have two types of data from the same samples: Fasta and VCF, both were provided from the company. So we don't know which procedures were adopted to acquire the VCF, like adapter trimming and which databases the snps were called from. Despite this, when using their VCF's, they weren't parcimony normalized, so the genotype.py that parse some arguments and informations was scrambled, my guess is that genotypes that differ from 0/0, 1/0, 0/1, 1/1 scrambled the data flow.
So normalizing with bcftools solved it.