Question: Error while attempting to create a VennDiagram with bcftools stats
0
gravatar for caro-ca
5 weeks ago by
caro-ca20
caro-ca20 wrote:

Dear, I am trying to create a VennDiagram by using bcftools stats and consequently plot-vcfstats. For bcftools stats I followed:

bgzip -c file.vcf > file.vcf.gz
tabix -p vcf file.vcf.gz

But tabix raised this error:

[E::hts_idx_push] Unsorted positions on sequence #3: 84070 followed by 84069

Therefore, to sort I used this code:

(grep ^"#" file.vcf ; grep -v ^"#" file.vcf  | sort -k1,1 -k2,2n) > file1_sorted.vcf

Compressing and indexing worked fine for my two VCF files. However, while running

bcftools stats file1_sorted.vcf.gz file2_sorted.vcf.gz

This is the stdout message:

Failed to open file2_sorted.vcf.gz: unknown file type

I hope you could help me out. Thank you in advance

sort bcftools venndiagram • 164 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by caro-ca20
1

Can you try using bcftools sort? It is best to stay with tools designed for specific file formats in some cases (VCF being one).

ADD REPLYlink written 5 weeks ago by genomax89k

Yeah, I tried and this is the error:

bcftools sort samples_all_merged_STROPE.vcf -Ov samples_all_merged_STROPE_sorted.vcf.gz
Writing to /tmp/bcftools-sort.CWJKlF
[W::vcf_parse] Contig 'chrI' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrII' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrIII' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrIV' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrIX' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrM' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrV' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrVI' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrVII' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrVIII' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrX' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrXI' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrXII' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrXIII' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrXIV' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrXV' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chrXVI' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::bcf_write] Unchecked error (1), exiting
ADD REPLYlink written 5 weeks ago by caro-ca20

Can you give us the output of grep "^##" samples_all_merged_STROPE.vcf

ADD REPLYlink written 5 weeks ago by RamRS30k
##ALT=<ID=BND,Description="Translocation">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INS,Description="Insertion">
##ALT=<ID=INV,Description="Inversion">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample  Sample_1        Sample_2        Sample_3        Sample_4        Sample_5        Sample_6        Sample_7        Sample_8        Sample_9        Sample_10       Sample_11       Sample_12       Sample_13       Sample_14       Sample_15       Sample_16       Sample_17       Sample_18       Sample_19       Sample_20       Sample_21       Sample_22       Sample_23
       Sample_24       Sample_25       Sample_26       Sample_27       Sample_28       Sample_29       Sample_30       Sample_31       Sample_32
       Sample_33       Sample_34       Sample_35       Sample_36       Sample_37       Sample_38       Sample_39       Sample_40       Sample_41
       Sample_42       Sample_43       Sample_44       Sample_45       Sample_46       Sample_47       Sample_48       Sample_49       Sample_50
       Sample_51       Sample_52       Sample_53       Sample_54       Sample_55       Sample_56       Sample_57       Sample_58       Sample_59
       Sample_60       Sample_61       Sample_62       Sample_63       Sample_64       Sample_65       Sample_66       Sample_67       Sample_68
       Sample_69       Sample_70       Sample_71       Sample_72       Sample_73       Sample_74       Sample_75       Sample_76       Sample_77
       Sample_78       Sample_79       Sample_80       Sample_81       Sample_82       Sample_83       Sample_84       Sample_85       Sample_86
       Sample_87       Sample_88       Sample_89       Sample_90       Sample_91       Sample_92
##fileDate=20200812
##fileformat=VCFv4.1
##FORMAT=<ID=AAL,Number=1,Type=String,Description="Alternative allele sequence reported from input.">
##FORMAT=<ID=CO,Number=1,Type=String,Description="Coordinates">
##FORMAT=<ID=DR,Number=2,Type=Integer,Description="# supporting reference,variant reads in that order">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=ID,Number=1,Type=String,Description="Variant ID from input.">
##FORMAT=<ID=LN,Number=1,Type=Integer,Description="predicted length">
##FORMAT=<ID=PSV,Number=1,Type=String,Description="Previous support vector">
##FORMAT=<ID=QV,Number=1,Type=String,Description="Quality values: if not defined a . otherwise the reported value.">
##FORMAT=<ID=RAL,Number=1,Type=String,Description="Reference allele sequence reported from input.">
##FORMAT=<ID=ST,Number=1,Type=String,Description="Strand of SVs">
##FORMAT=<ID=TY,Number=1,Type=String,Description="Types">
##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for END coordinate in case of a translocation">
##INFO=<ID=CIEND,Number=2,Type=String,Description="PE confidence interval around END">
##INFO=<ID=CIPOS,Number=2,Type=String,Description="PE confidence interval around POS">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
##INFO=<ID=MAPQ,Number=1,Type=Integer,Description="Median mapping quality of paired-ends">
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Precise structural variation">
##INFO=<ID=RE,Number=1,Type=Integer,Description="read support">
##INFO=<ID=STRANDS,Number=1,Type=String,Description="Indicating the direction of the reads with respect to the type and breakpoint.">
##INFO=<ID=SUPP,Number=1,Type=String,Description="Number of samples supporting the variant">
##INFO=<ID=SUPP_VEC,Number=1,Type=String,Description="Vector of supporting samples.">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of the SV">
##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Method for generating this merged VCF file.">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of the SV.">
##source=SURVIVOR
~
ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by caro-ca20

This file is really short compared to the non-sorted file, is there a way to upload files?

ADD REPLYlink written 5 weeks ago by caro-ca20

It tells me enough to diagnose what went wrong. There is no need to upload large files.

ADD REPLYlink written 5 weeks ago by RamRS30k

You should not have used the grep -> sort thing you used - that command sequence is flawed and it corrupted the file. Please use bcftools sort on the file.vcf that you used grep/sort on.

ADD REPLYlink written 5 weeks ago by RamRS30k

I used this code, but it did not work.

bcftools sort samples_all_merged_STROPE.vcf -Ov samples_all_merged_STROPE_sorted.vcf.gz

Am I using it correctly?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by caro-ca20

No, -Ov means you're expected uncompressed VCF as output, which you're saving in a vcf.gz file. Use either -Oz or save it as a .vcf file.

Can you give us the output to:

bcftools view -h samples_all_merged_STROPE.vcf | grep -m50 "^##contig"
ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by RamRS30k

This is the output of bcftools

   bcftools sort samples_all_merged_STROPE.vcf -Ov samples_all_merged_STROPE_sorted.vcf
    Writing to /tmp/bcftools-sort.iB3jln
    [W::vcf_parse] Contig 'chrI' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrII' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrIII' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrIV' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrIX' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrM' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrV' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrVI' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrVII' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrVIII' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrX' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrXI' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrXII' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrXIII' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrXIV' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrXV' is not defined in the header. (Quick workaround: index the file with tabix.)
    [W::vcf_parse] Contig 'chrXVI' is not defined in the header. (Quick workaround: index the file with tabix.)
    [E::bcf_write] Unchecked error (1), exiting

and bcftools view does not retrieve anything.

ADD REPLYlink written 5 weeks ago by caro-ca20

To sum up:

  • I used bcftools sort , it did not work (stdout message above).
  • Consequently I used bcftools view -h samples_all_merged_STROPE.vcf | grep -m50 "^##contig" having no output nor error messages.
  • Lastly, I used bcftools view samples_all_merged_STROPE.vcf -Oz -o samples_all_merged_STROPE_sorted.vcf.gz having an output, but while executing bcftools index samples_all_merged_STROPE_sorted.vcf.gz this is the error message:

    [E::hts_idx_push] Unsorted positions on sequence #3: 84070 followed by 84069
    index: failed to create index for "samples_all_merged_STROPE_sorted.vcf.gz"
    

Thank you in advance for your persistent help.

ADD REPLYlink modified 5 weeks ago by RamRS30k • written 5 weeks ago by caro-ca20

It looks like your contig lines are missing - which is a HUGE problem. The grep output from a comment above shows me your header is already messed up. The conclusion is - the file samples_all_merged_STROPE.vcf is not usable in its current form.

How did you create the samples_all_merged_STROPE.vcf file? You're going to have to re-create it after fixing any problems in the process that created it.

ADD REPLYlink written 5 weeks ago by RamRS30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1100 users visited in the last hour