Question: Problems with vcf-merge
1
gravatar for devenvyas
4.9 years ago by
devenvyas570
Stony Brook
devenvyas570 wrote:

I have a list of SNPs from an old Illumina array. I've concatenated and filtered two sets of VCF files (http://cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal/VCF/, http://cdna.eva.mpg.de/denisova/VCF/hg19_1000g/) from two diploid genomes I am trying to work with. I am now trying to merge the two VCFs into a single file

using

vcf-merge AltaiNea.recode.vcf.gz Denisovan.recode.vcf.gz | bgzip -c > isec.vcf.gz

but I am getting these errors below. I need help understanding 1) what this text actually means and 2) how to fix the error. I have tried using vcf-isec instead, but I get the same/similar errors.

gzip: stdout: Broken pipe
Using column name 'DenisovaPinky' for Denisovan.recode.vcf.gz:DenisovaPinky
Could not determine the ploidy (nals=1, nvals=3). (TODO: ploidy bigger than 2)
3
 at /apps/vcftools/0.1.11/lib/perl5/site_perl/Vcf.pm line 177, <__ANONIO__> line 2.
        Vcf::throw('Vcf4_1=HASH(0x1a9a1e0)', 'Could not determine the ploidy (nals=1, nvals=3). (TODO: ploi...', 3) called at /apps/vcftools/0.1.11/lib/perl5/site_perl/Vcf.pm line 2408
        VcfReader::guess_ploidy('Vcf4_1=HASH(0x1a9a1e0)', 1, 3) called at /apps/vcftools/0.1.11/lib/perl5/site_perl/Vcf.pm line 1764
        VcfReader::parse_AGtags('Vcf4_1=HASH(0x1a9a1e0)', 'HASH(0x18c3528)') called at /apps/vcftools/0.1.11/bin/vcf-merge line 461
        main::merge_vcf_files('HASH(0x1598108)') called at /apps/vcftools/0.1.11/bin/vcf-merge line 12
(END)

Thanks!

-Deven

snp vcftools vcf • 2.3k views
ADD COMMENTlink modified 4.7 years ago by Biostar ♦♦ 20 • written 4.9 years ago by devenvyas570

Do the VCF files individually pass vcf-validator? My guess would be not.

ADD REPLYlink written 4.9 years ago by Devon Ryan88k

Do you know of any fix? I did not generate the VCFs on my own. I just downloaded them from the Max Planck.

ADD REPLYlink written 4.9 years ago by devenvyas570

Can you post a link to them? That'd make it easier to determine exactly what's causing this (though I have my guesses).

ADD REPLYlink written 4.9 years ago by Devon Ryan88k

Here and here

http://cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal/VCF/

http://cdna.eva.mpg.de/denisova/VCF/hg19_1000g/

I have downloaded them, filtered them down based on the list of SNPs on my array, concatenated them. I then ran one last step to filtered out both non-biallelic sites and sites labeled LowQual.

I ran vcf-validator before and after that last step.

I got error lines like this back

column AltaiNea at 1:717485 .. FORMAT tag [PL] expected different number of values (expected 1, found 3)
column AltaiNea at 1:776546 .. FORMAT tag [PL] expected different number of values (expected 1, found 3)
column AltaiNea at 1:846808 .. FORMAT tag [PL] expected different number of values (expected 1, found 3)
column AltaiNea at 1:900505 .. FORMAT tag [PL] expected different number of values (expected 1, found 3)
column AltaiNea at 1:918384 .. FORMAT tag [PL] expected different number of values (expected 1, found 3)

and this

FILTER field at 1:1119858 ..  The filter(s) [LowQual] not listed in the header.
column DenisovaPinky at 1:3765424 .. FORMAT tag [PL] expected different number of values (expected 6, found 3)
column DenisovaPinky at 1:56968755 .. FORMAT tag [PL] expected different number of values (expected 6, found 3)
column DenisovaPinky at 1:63384292 .. FORMAT tag [PL] expected different number of values (expected 6, found 3)
column DenisovaPinky at 1:101725605 .. FORMAT tag [PL] expected different number of values (expected 6, found 3)
column DenisovaPinky at 1:102350747 .. FORMAT tag [PL] expected different number of values (expected 6, found 3)
column DenisovaPinky at 1:153753725 .. FORMAT tag [PL] expected different number of values (expected 6, found 3).

The list is much longer for the Neanderthal than the Denisovan.

ADD REPLYlink written 4.9 years ago by devenvyas570

I am currently validating all the original files to see whether the error occurred somewhere in my actions or if it was like this from the beginning

UPDATE: The original files are starting to come out from vcf-validate, they are all failing. What do I do?

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by devenvyas570
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1747 users visited in the last hour