Question: vcf compare between 2 vcf files using vcf tools
1
gravatar for bioguy24
4.3 years ago by
bioguy24200
Chicago
bioguy24200 wrote:

Comparing to vcf files using vcf-compare: The command below runs, but it is very slow, both are indexed by tabix and both vcf files are ~ 2MB. Thank you :).

vcf-compare getrm_NA12878.vcf.gz NA12878.vcf.gz > compare

Is the file format an issue?

head -10 getrm_NA12878.vcf
##fileformat=VCFv4.1
##fileDate=Thu Mar 28 08:49:37 2013
##source= ConvertVCFGetRM to dump from ARUP_NA12878_exome.vcf
##fileDate=Thu Mar 28 10:02:59 2013
##source= ConvertVCFGetRM to dump from ARUP_NA12878_RainDance_Mito.vcf
##fileDate=Thu Mar 28 10:38:24 2013
##source= ConvertVCFGetRM to dump from /netmnt/traces04/sra-ftp-misc/development/get-rm/March_2013/BCM_MGL/VCF/processing_file/BCM-MGL_NA12878_PFIC.txt
##fileDate=Thu Mar 28 10:36:42 2013
##source= ConvertVCFGetRM to dump from /netmnt/traces04/sra-ftp-misc/development/get-rm/March_2013/BCM_MGL/VCF/processing_file/BCM-MGL_NA12878_GSD.txt

 head -10 NA12878.vcf
##fileformat=VCFv4.1
##fileDate=20160428
##fileUTCtime=2016-04-28T07:47:26
##source="tvc 5.0-13 (e975447) - Torrent Variant Caller"
##parametersName="Generic - Proton P1 or S5/S5XL (540) - Germ Line - Low Stringency"
##parametersDetails="germline_low_stringency_p1_540, TS version: 5.0"
##basecallerVersion="5.0-13/e975447"
##tmapVersion="5.0.13 (e975447) (201510291545)"
##reference=/results/referenceLibrary/tmap-f3/hg19/hg19.fasta
##phasing=none
vcf-tools • 4.4k views
ADD COMMENTlink written 4.3 years ago by bioguy24200

I think vcf-compare has been replaced with a faster version now: http://vcftools.sourceforge.net/perl_module.html - A fast htslib C version of this tool is now available (see bcftools stats).

ADD REPLYlink written 4.3 years ago by Tonor430

That fixed it and the command does run and output a compare file but it does not look right.

bcftools stats getrm_NA12878.vcf.gz NA12878.vcf.gz > compare
[W::vcf_parse] INFO 'AC' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'AN' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'SF' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'GENE' is not defined in the header, assuming Type=String
[W::vcf_parse] FILTER 'TruthSensitivityTranche99.00to99.90' is not defined in the header
[W::vcf_parse] INFO 'DB' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'DNA' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'RNA' is not defined in the header, assuming Type=String
[E::bcf_calc_ac] todo: 7 at 1:955597

compare (output)
# This file was produced by bcftools stats (1.3.1+htslib-1.3.1) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  getrm_NA12878.vcf.gz NA12878.vcf.gz
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID  0   getrm_NA12878.vcf.gz
ID  1   NA12878.vcf.gz
ID  2   getrm_NA12878.vcf.gz    NA12878.vcf.gz

Thank you :).

ADD REPLYlink written 4.3 years ago by bioguy24200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1639 users visited in the last hour
_