Question: Intersecting SNPs with vcftools vcf-isec - error message
0
gravatar for weiserr
10 months ago by
weiserr0
weiserr0 wrote:

Hi,

I would like to identify all of the intersecting SNPs in .vcf files generated from snippy (https://github.com/tseemann/snippy). There are 71 .vcf files that were generated from mapping reads onto a P. aeruginosa PA14 reference genome. I am trying to use vcftools vcf-isec to identify the intersecting snps. The .vcf files have been compressed by bgzip and indexed by tabix to give .vcf.gz and .vcf.gz.tbi files.

When I run vcf-isec on just two of the files as a test I get the following error message:

$ vcf-isec -n +2 1_snps.vcf.gz 2_snps.vcf.gz | bgzip -c > isec.vcf.gz

Leading or trailing space in attr_key-attr_value pairs is discouraged:
        [Description] [Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ]
        INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
 at /usr/share/perl5/Vcf.pm line 180.

If I run vcf-validator on one of the files I get the following error message:

$vcf-validator 1_snps.vcf.gz

Leading or trailing space in attr_key-attr_value pairs is discouraged:
        [Description] [Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ]
        INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">

However I can run a check on the file before it is compressed with bgzip and it works with the following output:

$ vcftools --vcf 1_snps.vcf

VCFtools - v0.1.13
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
        --vcf Tsb_1_snps.vcf
After filtering, kept 1 out of 1 Individuals
After filtering, kept 40 out of a possible 40 Sites
Run Time = 0.00 seconds

Please does anyone have any suggestions about the error messages I am getting with vcf-isec or vcf-validator? Any help would be appreciated.

Thanks.

snp • 453 views
ADD COMMENTlink written 10 months ago by weiserr0

i guess the problem is with the vcf header line:

INFO=<id=ann,number=.,type=string,description="functional annotations:="" 'allele="" |="" annotation="" |="" annotation_impact="" |="" gene_name="" |="" gene_id="" |="" feature_type="" |="" feature_id="" |="" transcript_biotype="" |="" rank="" |="" hgvs.c="" |="" hgvs.p="" |="" cdna.pos="" cdna.length="" |="" cds.pos="" cds.length="" |="" aa.pos="" aa.length="" |="" distance="" |="" errors="" warnings="" info'="" "="">

check if you have such line headers. Also it helps if you could post the VCF headers and example records.

ADD REPLYlink modified 10 months ago • written 10 months ago by cpad011211k

Just a quick comment: you should be using BCFtools. Even one of the chief developers of VCFtools recommends to switch from VCFtools to BCFtools.

ADD REPLYlink written 10 months ago by Kevin Blighe37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1657 users visited in the last hour