Intersecting SNPs with vcftools vcf-isec - error message
0
0
Entering edit mode
6.0 years ago
weiserr • 0

Hi,

I would like to identify all of the intersecting SNPs in .vcf files generated from snippy (https://github.com/tseemann/snippy). There are 71 .vcf files that were generated from mapping reads onto a P. aeruginosa PA14 reference genome. I am trying to use vcftools vcf-isec to identify the intersecting snps. The .vcf files have been compressed by bgzip and indexed by tabix to give .vcf.gz and .vcf.gz.tbi files.

When I run vcf-isec on just two of the files as a test I get the following error message:

$ vcf-isec -n +2 1_snps.vcf.gz 2_snps.vcf.gz | bgzip -c > isec.vcf.gz

Leading or trailing space in attr_key-attr_value pairs is discouraged:
        [Description] [Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ]
        INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
 at /usr/share/perl5/Vcf.pm line 180.

If I run vcf-validator on one of the files I get the following error message:

$vcf-validator 1_snps.vcf.gz

Leading or trailing space in attr_key-attr_value pairs is discouraged:
        [Description] [Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ]
        INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">

However I can run a check on the file before it is compressed with bgzip and it works with the following output:

$ vcftools --vcf 1_snps.vcf

VCFtools - v0.1.13
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
        --vcf Tsb_1_snps.vcf
After filtering, kept 1 out of 1 Individuals
After filtering, kept 40 out of a possible 40 Sites
Run Time = 0.00 seconds

Please does anyone have any suggestions about the error messages I am getting with vcf-isec or vcf-validator? Any help would be appreciated.

Thanks.

SNP • 2.7k views
ADD COMMENT
0
Entering edit mode

i guess the problem is with the vcf header line:

INFO=<id=ann,number=.,type=string,description="functional annotations:="" 'allele="" |="" annotation="" |="" annotation_impact="" |="" gene_name="" |="" gene_id="" |="" feature_type="" |="" feature_id="" |="" transcript_biotype="" |="" rank="" |="" hgvs.c="" |="" hgvs.p="" |="" cdna.pos="" cdna.length="" |="" cds.pos="" cds.length="" |="" aa.pos="" aa.length="" |="" distance="" |="" errors="" warnings="" info'="" "="">

check if you have such line headers. Also it helps if you could post the VCF headers and example records.

ADD REPLY
0
Entering edit mode

Just a quick comment: you should be using BCFtools. Even one of the chief developers of VCFtools recommends to switch from VCFtools to BCFtools.

ADD REPLY

Login before adding your answer.

Traffic: 2673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6