VCF parse error: Could not add dummy header for contig
0
0
Entering edit mode
3.4 years ago
User000 ▴ 690

Dear all,

I split my tetraploid genome into chromosomes and chromosome short arm and long arm in order to do variant calling in parallel. Now I am doing some filtering steps and all the chromosomes worked fine, except one chr7B_long_arm and is giving me the following error:

[W::vcf_parse] Contig '��f$�h��
                                 ���eԎ���H���`ݶ
                                                  f{�Fo�Y����@00uMb�z-��I$&�gf���7Ӵ�u|'K.�oP' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'P���F�.��o��9B<~.' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::vcf_parse] Could not add dummy header for contig 'P���F�.��o��9B<~.'
[W::vcf_parse] Contig 'P���F�.��o��9B<~.' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::vcf_parse] Could not add dummy header for contig 'P���F�.��o��9B<~.'
[W::vcf_parse] Contig 'P���F�.��o��9B<~.' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::vcf_parse] Could not add dummy header for contig 'P���F�.��o��9B<~.'

If I look at vcf file I dont find any error... How can I find the error above? I do not understand what is the problem. And the worst thing is I do not know how to find this error in the vcf file to see what is going wrong. Since all the other files worked just fine. These are the chromosomes names:

##contig=<ID=chr1A,length=585266722>
##contig=<ID=chr1B,length=681112512>
##contig=<ID=chr2A,length=775448786>
##contig=<ID=chr2B,length=790338525>
##contig=<ID=chr3A,length=746673839>
##contig=<ID=chr3B,length=836514780>
##contig=<ID=chr4A,length=736872137>
##contig=<ID=chr4B,length=676292951>
##contig=<ID=chr5A,length=669155517>
##contig=<ID=chr5B,length=701372996>
##contig=<ID=chr6A,length=615672275>
##contig=<ID=chr6B,length=698614761>
##contig=<ID=chr7A,length=728031845>
##contig=<ID=chr7B,length=722970987>
##contig=<ID=chrUn,length=498719471>

This is the command line:

rule filter_f1:
    input:
        donevcf="freeb/{chr}.flanking.vcf"
    output:
        f1=temp("freeb/{chr}.flanking.f1.vcf")
    shell:
        """
        /Tools/bcftools/bcftools view --types snps -m2 -M2 -q 0.01:minor {input.donevcf} > {output.f1}
        """

and file my.vcf output:

my.vcf: Variant Call Format (VCF) version 4.2, ASCII text, with very long lines

This is also giving me an error when I try to create a vcf.gz file with bgzip and index it with tabix:

[E::get_intv] Failed to parse TBX_VCF, was wrong -p [type] used?
The offending line was: "P���F�.��o��9B<~."
vcf freebayes • 2.8k views
ADD COMMENT
0
Entering edit mode

what was the command line ? what is the output of file the.vcf ? What are the names of the chromosomes in the reference ?

ADD REPLY
1
Entering edit mode

I updated the question including all your doubts.

ADD REPLY
0
Entering edit mode

thank you.

what is the output of grep '^chr7B_long_arm' the.vcf' | file -

and please, show us a line of

grep -m1 '^chr7B_long_arm' the.vcf

ADD REPLY
0
Entering edit mode

These are the chromosomes names

i don't see chr7B_long_arm here

ADD REPLY
0
Entering edit mode

actually cht7B_long_arm is the name of my vcf file that does not work...

ADD REPLY
0
Entering edit mode

I solved the problem by going back to the original bgzipped file anmd decompressed it again, now it works... so probably something went wrong when I decompressed the files the first fime.

ADD REPLY

Login before adding your answer.

Traffic: 1874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6