vcf compare between 2 vcf files using vcf tools
0
1
Entering edit mode
8.0 years ago
bioguy24 ▴ 230

Comparing to vcf files using vcf-compare: The command below runs, but it is very slow, both are indexed by tabix and both vcf files are ~ 2MB. Thank you :).

vcf-compare getrm_NA12878.vcf.gz NA12878.vcf.gz > compare

Is the file format an issue?

head -10 getrm_NA12878.vcf
##fileformat=VCFv4.1
##fileDate=Thu Mar 28 08:49:37 2013
##source= ConvertVCFGetRM to dump from ARUP_NA12878_exome.vcf
##fileDate=Thu Mar 28 10:02:59 2013
##source= ConvertVCFGetRM to dump from ARUP_NA12878_RainDance_Mito.vcf
##fileDate=Thu Mar 28 10:38:24 2013
##source= ConvertVCFGetRM to dump from /netmnt/traces04/sra-ftp-misc/development/get-rm/March_2013/BCM_MGL/VCF/processing_file/BCM-MGL_NA12878_PFIC.txt
##fileDate=Thu Mar 28 10:36:42 2013
##source= ConvertVCFGetRM to dump from /netmnt/traces04/sra-ftp-misc/development/get-rm/March_2013/BCM_MGL/VCF/processing_file/BCM-MGL_NA12878_GSD.txt

 head -10 NA12878.vcf
##fileformat=VCFv4.1
##fileDate=20160428
##fileUTCtime=2016-04-28T07:47:26
##source="tvc 5.0-13 (e975447) - Torrent Variant Caller"
##parametersName="Generic - Proton P1 or S5/S5XL (540) - Germ Line - Low Stringency"
##parametersDetails="germline_low_stringency_p1_540, TS version: 5.0"
##basecallerVersion="5.0-13/e975447"
##tmapVersion="5.0.13 (e975447) (201510291545)"
##reference=/results/referenceLibrary/tmap-f3/hg19/hg19.fasta
##phasing=none
vcftools • 8.8k views
ADD COMMENT
0
Entering edit mode

I think vcf-compare has been replaced with a faster version now: http://vcftools.sourceforge.net/perl_module.html - A fast htslib C version of this tool is now available (see bcftools stats).

ADD REPLY
0
Entering edit mode

That fixed it and the command does run and output a compare file but it does not look right.

bcftools stats getrm_NA12878.vcf.gz NA12878.vcf.gz > compare
[W::vcf_parse] INFO 'AC' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'AN' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'SF' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'GENE' is not defined in the header, assuming Type=String
[W::vcf_parse] FILTER 'TruthSensitivityTranche99.00to99.90' is not defined in the header
[W::vcf_parse] INFO 'DB' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'DNA' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'RNA' is not defined in the header, assuming Type=String
[E::bcf_calc_ac] todo: 7 at 1:955597

compare (output)
# This file was produced by bcftools stats (1.3.1+htslib-1.3.1) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats  getrm_NA12878.vcf.gz NA12878.vcf.gz
#
# Definition of sets:
# ID    [2]id   [3]tab-separated file names
ID  0   getrm_NA12878.vcf.gz
ID  1   NA12878.vcf.gz
ID  2   getrm_NA12878.vcf.gz    NA12878.vcf.gz

Thank you :).

ADD REPLY

Login before adding your answer.

Traffic: 895 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6