Problems with vcf-merge
0
1
Entering edit mode
9.9 years ago
devenvyas ▴ 740

I have a list of SNPs from an old Illumina array. I've concatenated and filtered two sets of VCF files (http://cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal/VCF/, http://cdna.eva.mpg.de/denisova/VCF/hg19_1000g/) from two diploid genomes I am trying to work with. I am now trying to merge the two VCFs into a single file

using

vcf-merge AltaiNea.recode.vcf.gz Denisovan.recode.vcf.gz | bgzip -c > isec.vcf.gz

but I am getting these errors below. I need help understanding 1) what this text actually means and 2) how to fix the error. I have tried using vcf-isec instead, but I get the same/similar errors.

gzip: stdout: Broken pipe
Using column name 'DenisovaPinky' for Denisovan.recode.vcf.gz:DenisovaPinky
Could not determine the ploidy (nals=1, nvals=3). (TODO: ploidy bigger than 2)
3
 at /apps/vcftools/0.1.11/lib/perl5/site_perl/Vcf.pm line 177, <__ANONIO__> line 2.
        Vcf::throw('Vcf4_1=HASH(0x1a9a1e0)', 'Could not determine the ploidy (nals=1, nvals=3). (TODO: ploi...', 3) called at /apps/vcftools/0.1.11/lib/perl5/site_perl/Vcf.pm line 2408
        VcfReader::guess_ploidy('Vcf4_1=HASH(0x1a9a1e0)', 1, 3) called at /apps/vcftools/0.1.11/lib/perl5/site_perl/Vcf.pm line 1764
        VcfReader::parse_AGtags('Vcf4_1=HASH(0x1a9a1e0)', 'HASH(0x18c3528)') called at /apps/vcftools/0.1.11/bin/vcf-merge line 461
        main::merge_vcf_files('HASH(0x1598108)') called at /apps/vcftools/0.1.11/bin/vcf-merge line 12
(END)

Thanks!
-Deven

vcftools vcf SNP • 4.0k views
ADD COMMENT
0
Entering edit mode

Do the VCF files individually pass vcf-validator? My guess would be not.

ADD REPLY
0
Entering edit mode

Do you know of any fix? I did not generate the VCFs on my own. I just downloaded them from the Max Planck.

ADD REPLY
0
Entering edit mode

Can you post a link to them? That'd make it easier to determine exactly what's causing this (though I have my guesses).

ADD REPLY
0
Entering edit mode

Here and here

I have downloaded them, filtered them down based on the list of SNPs on my array, concatenated them. I then ran one last step to filtered out both non-biallelic sites and sites labeled LowQual.

I ran vcf-validator before and after that last step.

I got error lines like this back

column AltaiNea at 1:717485 .. FORMAT tag [PL] expected different number of values (expected 1, found 3)
column AltaiNea at 1:776546 .. FORMAT tag [PL] expected different number of values (expected 1, found 3)
column AltaiNea at 1:846808 .. FORMAT tag [PL] expected different number of values (expected 1, found 3)
column AltaiNea at 1:900505 .. FORMAT tag [PL] expected different number of values (expected 1, found 3)
column AltaiNea at 1:918384 .. FORMAT tag [PL] expected different number of values (expected 1, found 3)

and this

FILTER field at 1:1119858 .. The filter(s) [LowQual] not listed in the header.
column DenisovaPinky at 1:3765424 .. FORMAT tag [PL] expected different number of values (expected 6, found 3)
column DenisovaPinky at 1:56968755 .. FORMAT tag [PL] expected different number of values (expected 6, found 3)
column DenisovaPinky at 1:63384292 .. FORMAT tag [PL] expected different number of values (expected 6, found 3)
column DenisovaPinky at 1:101725605 .. FORMAT tag [PL] expected different number of values (expected 6, found 3)
column DenisovaPinky at 1:102350747 .. FORMAT tag [PL] expected different number of values (expected 6, found 3)
column DenisovaPinky at 1:153753725 .. FORMAT tag [PL] expected different number of values (expected 6, found 3).

The list is much longer for the Neanderthal than the Denisovan.

ADD REPLY
0
Entering edit mode

I am currently validating all the original files to see whether the error occurred somewhere in my actions or if it was like this from the beginning

UPDATE: The original files are starting to come out from vcf-validate, they are all failing. What do I do?

ADD REPLY

Login before adding your answer.

Traffic: 2014 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6