Question: VCF parse error: Could not add dummy header for contig
0
gravatar for User000
12 weeks ago by
User000440
User000440 wrote:

Dear all,

I split my tetraploid genome into chromosomes and chromosome short arm and long arm in order to do variant calling in parallel. Now I am doing some filtering steps and all the chromosomes worked fine, except one chr7B_long_arm and is giving me the following error:

[W::vcf_parse] Contig '��f$�h��
                                 ���eԎ���H���`ݶ
                                                  f{�Fo�Y����@00uMb�z-��I$&�gf���7Ӵ�u|'K.�oP' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'P���F�.��o��9B<~.' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::vcf_parse] Could not add dummy header for contig 'P���F�.��o��9B<~.'
[W::vcf_parse] Contig 'P���F�.��o��9B<~.' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::vcf_parse] Could not add dummy header for contig 'P���F�.��o��9B<~.'
[W::vcf_parse] Contig 'P���F�.��o��9B<~.' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::vcf_parse] Could not add dummy header for contig 'P���F�.��o��9B<~.'

If I look at vcf file I dont find any error... How can I find the error above? I do not understand what is the problem. And the worst thing is I do not know how to find this error in the vcf file to see what is going wrong. Since all the other files worked just fine. These are the chromosomes names:

##contig=<ID=chr1A,length=585266722>
##contig=<ID=chr1B,length=681112512>
##contig=<ID=chr2A,length=775448786>
##contig=<ID=chr2B,length=790338525>
##contig=<ID=chr3A,length=746673839>
##contig=<ID=chr3B,length=836514780>
##contig=<ID=chr4A,length=736872137>
##contig=<ID=chr4B,length=676292951>
##contig=<ID=chr5A,length=669155517>
##contig=<ID=chr5B,length=701372996>
##contig=<ID=chr6A,length=615672275>
##contig=<ID=chr6B,length=698614761>
##contig=<ID=chr7A,length=728031845>
##contig=<ID=chr7B,length=722970987>
##contig=<ID=chrUn,length=498719471>

This is the command line:

rule filter_f1:
    input:
        donevcf="freeb/{chr}.flanking.vcf"
    output:
        f1=temp("freeb/{chr}.flanking.f1.vcf")
    shell:
        """
        /Tools/bcftools/bcftools view --types snps -m2 -M2 -q 0.01:minor {input.donevcf} > {output.f1}
        """

and file my.vcf output:

my.vcf: Variant Call Format (VCF) version 4.2, ASCII text, with very long lines

This is also giving me an error when I try to create a vcf.gz file with bgzip and index it with tabix:

[E::get_intv] Failed to parse TBX_VCF, was wrong -p [type] used?
The offending line was: "P���F�.��o��9B<~."
freebayes vcf • 261 views
ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by User000440

what was the command line ? what is the output of file the.vcf ? What are the names of the chromosomes in the reference ?

ADD REPLYlink written 12 weeks ago by Pierre Lindenbaum134k
1

I updated the question including all your doubts.

ADD REPLYlink written 12 weeks ago by User000440

thank you.

what is the output of grep '^chr7B_long_arm' the.vcf' | file -

and please, show us a line of

grep -m1 '^chr7B_long_arm' the.vcf

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by Pierre Lindenbaum134k

These are the chromosomes names

i don't see chr7B_long_arm here

ADD REPLYlink written 12 weeks ago by Pierre Lindenbaum134k

actually cht7B_long_arm is the name of my vcf file that does not work...

ADD REPLYlink written 12 weeks ago by User000440

I solved the problem by going back to the original bgzipped file anmd decompressed it again, now it works... so probably something went wrong when I decompressed the files the first fime.

ADD REPLYlink written 12 weeks ago by User000440
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1852 users visited in the last hour
_