Hi,
I would like to identify all of the intersecting SNPs in .vcf files generated from snippy (https://github.com/tseemann/snippy). There are 71 .vcf files that were generated from mapping reads onto a P. aeruginosa PA14 reference genome. I am trying to use vcftools vcf-isec to identify the intersecting snps. The .vcf files have been compressed by bgzip and indexed by tabix to give .vcf.gz and .vcf.gz.tbi files.
When I run vcf-isec on just two of the files as a test I get the following error message:
$ vcf-isec -n +2 1_snps.vcf.gz 2_snps.vcf.gz | bgzip -c > isec.vcf.gz
Leading or trailing space in attr_key-attr_value pairs is discouraged:
[Description] [Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ]
INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
at /usr/share/perl5/Vcf.pm line 180.
If I run vcf-validator on one of the files I get the following error message:
$vcf-validator 1_snps.vcf.gz
Leading or trailing space in attr_key-attr_value pairs is discouraged:
[Description] [Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ]
INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
However I can run a check on the file before it is compressed with bgzip and it works with the following output:
$ vcftools --vcf 1_snps.vcf
VCFtools - v0.1.13
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--vcf Tsb_1_snps.vcf
After filtering, kept 1 out of 1 Individuals
After filtering, kept 40 out of a possible 40 Sites
Run Time = 0.00 seconds
Please does anyone have any suggestions about the error messages I am getting with vcf-isec or vcf-validator? Any help would be appreciated.
Thanks.
i guess the problem is with the vcf header line:
INFO=<id=ann,number=.,type=string,description="functional annotations:="" 'allele="" |="" annotation="" |="" annotation_impact="" |="" gene_name="" |="" gene_id="" |="" feature_type="" |="" feature_id="" |="" transcript_biotype="" |="" rank="" |="" hgvs.c="" |="" hgvs.p="" |="" cdna.pos="" cdna.length="" |="" cds.pos="" cds.length="" |="" aa.pos="" aa.length="" |="" distance="" |="" errors="" warnings="" info'="" "="">
check if you have such line headers. Also it helps if you could post the VCF headers and example records.
Just a quick comment: you should be using BCFtools. Even one of the chief developers of VCFtools recommends to switch from VCFtools to BCFtools.