why there is 0/0 genotyp in VCF file?
0
2
Entering edit mode
5.2 years ago
star ▴ 350

I have multiple VCF files (they are not multiple-sample VCF) and I just looked the genotypes and I saw there are 4 genotypes for each variant like: 0/0, 0/1, 1/1, ./. . The last three genotypes are clear to me, but I don`t understand why we have 'Hom_ref genotype' in the VCF files? And shall I ignore them?

VCF GATK Variant • 2.8k views
ADD COMMENT
3
Entering edit mode

Was this VCF generated from just a single sample or was it created by a multi-sample VCF being split? If it's the latter, the presence of 0/0 has a more obvious reason. If it's the former, the variant calling algorithm needs to be looked into to understand the criteria for a hom-ref call to be made.

ADD REPLY
0
Entering edit mode

I guess it is only one sample.

here is the header of VCF:

##GATKCommandLine.HaplotypeCaller=<ID=HaplotypeCaller,Version=3.7-0-gcfedb67,Date="Tue Aug 06 15:56:24 EDT 2018",Epoch=1565122164357,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[df.bam] showFullBamList=false read_buffer_size=null read_filter=[] disable_read_filter=[] intervals=[chr2] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/reference/hg19/hg19.fa nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false out=/df.dbsnp.vcf.gz likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN dbsnp=(RodBinding name=dbsnp source=allels.sorted.vcf.gz) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[] excludeAnnotation=[] group=[StandardAnnotation, StandardHCAnnotation] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=NONE bamOutput=null bamWriterType=CALLED_HAPLOTYPES emitDroppedReads=false disableOptimizations=false annotateNDA=false useNewAFCalculator=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 heterozygosity_stdev=0.01 standard_min_confidence_threshold_for_calling=10.0 standard_min_confidence_threshold_for_emitting=30.0 max_alternate_alleles=6 max_genotype_count=1024 max_num_PL_values=100 input_prior=[] sample_ploidy=2 genotyping_mode=GENOTYPE_GIVEN_ALLELES alleles=(RodBinding name=alleles source=allels.sorted.vcf.gz) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=null exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=false gcpHMM=10 pair_hmm_implementation=VECTOR_LOGLESS_CACHING pair_hmm_sub_implementation=ENABLE_ALL always_load_vector_logless_PairHMM_lib=false phredScaledGlobalReadMismappingRate=45 noFpga=false sample_name=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingBranches=false minDanglingBranchLength=4 consensus=false maxNumHaplotypesInPopulation=128 errorCorrectKmers=false minPruning=2 debugGraphTransformations=false allowCyclesInKmerGraphToGeneratePaths=false graphOutput=null kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 GVCFGQBands=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 70, 80, 90, 99] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 includeUmappedReads=false useAllelesTrigger=false doNotRunPhysicalPhasing=true keepRG=null justDetermineActiveRegions=false dontGenotype=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false errorCorrectReads=false pcr_indel_model=CONSERVATIVE maxReadsInRegionPerSample=10000 minReadsPerAlignmentStart=10 mergeVariantsViaLD=false activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null maxReadsInMemoryPerSample=30000 maxTotalReadsInMemory=10000000 maxProbPropagationDistance=50 activeProbabilityThreshold=0.002 min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
ADD REPLY
2
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks but my VCF is a single sample VCF file.

ADD REPLY
0
Entering edit mode

In most situations, if you see a 0/0, it will appear in a multi-sample VCF at a site where a HET or HOM variant was called in another sample at the same site.

How does one account for single sample hom-ref calls though?

ADD REPLY
0
Entering edit mode

The occurrence of 0/0 in a single sample VCF may be caused by low-quality mapping.

ADD REPLY
0
Entering edit mode

Can you expand on this a little? I'm curious how low quality mapping can result in hom-ref calls.

ADD REPLY
0
Entering edit mode

Actually, I came across a freebayes issue with Homo ref call, forgot to reference it. https://github.com/ekg/freebayes/issues/517

ADD REPLY
2
Entering edit mode

That issue does not reach the conclusion that a 0/0 is caused by low quality mapping. Low quality would result in ./. if anything, IMO.

ADD REPLY
0
Entering edit mode

We don`t get ./. genotype, if there is low Q mapping?

ADD REPLY

Login before adding your answer.

Traffic: 2205 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6